Supervised Learning Classification Project: AllLife Bank Personal Loan Campaign¶
Problem Statement¶
Context¶
AllLife Bank is a US bank that has a growing customer base. The majority of these customers are liability customers (depositors) with varying sizes of deposits. The number of customers who are also borrowers (asset customers) is quite small, and the bank is interested in expanding this base rapidly to bring in more loan business and in the process, earn more through the interest on loans. In particular, the management wants to explore ways of converting its liability customers to personal loan customers (while retaining them as depositors).
A campaign that the bank ran last year for liability customers showed a healthy conversion rate of over 9% success. This has encouraged the retail marketing department to devise campaigns with better target marketing to increase the success ratio.
You as a Data scientist at AllLife bank have to build a model that will help the marketing department to identify the potential customers who have a higher probability of purchasing the loan.
Objective¶
To predict whether a liability customer will buy personal loans, to understand which customer attributes are most significant in driving purchases, and identify which segment of customers to target more.
Data Dictionary¶
ID: Customer IDAge: Customer’s age in completed yearsExperience: #years of professional experienceIncome: Annual income of the customer (in thousand dollars)ZIP Code: Home Address ZIP code.Family: the Family size of the customerCCAvg: Average spending on credit cards per month (in thousand dollars)Education: Education Level. 1: Undergrad; 2: Graduate;3: Advanced/ProfessionalMortgage: Value of house mortgage if any. (in thousand dollars)Personal_Loan: Did this customer accept the personal loan offered in the last campaign? (0: No, 1: Yes)Securities_Account: Does the customer have securities account with the bank? (0: No, 1: Yes)CD_Account: Does the customer have a certificate of deposit (CD) account with the bank? (0: No, 1: Yes)Online: Do customers use internet banking facilities? (0: No, 1: Yes)CreditCard: Does the customer use a credit card issued by any other Bank (excluding All life Bank)? (0: No, 1: Yes)
Importing necessary libraries¶
# Library to suppress warnings or deprecation notes
import warnings
warnings.filterwarnings("ignore")
# Libraries to help with reading and manipulating data
import pandas as pd
import numpy as np
# Library to split data
from sklearn.model_selection import train_test_split
# libaries to help with data visualization
import matplotlib.pyplot as plt
import seaborn as sns
# Removes the limit for the number of displayed columns
pd.set_option("display.max_columns", None)
# Sets the limit for the number of displayed rows
pd.set_option("display.max_rows", 200)
#Libraies to builde the logistic regression model
from sklearn.linear_model import LogisticRegression
# Libraries to build decision tree classifier
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
# To tune different models
from sklearn.model_selection import GridSearchCV
# To perform statistical analysis
import scipy.stats as stats
#SFS
from mlxtend.feature_selection import SequentialFeatureSelector as SFS
from mlxtend.plotting import plot_sequential_feature_selection as plot_sfs
# To get different metric scores
from sklearn.metrics import (
f1_score,
accuracy_score,
recall_score,
precision_score,
confusion_matrix,
ConfusionMatrixDisplay,
make_scorer,
)
pip install uszipcode==1.0.1 'sqlalchemy-mate<2'
Collecting uszipcode==1.0.1 Downloading uszipcode-1.0.1-py2.py3-none-any.whl.metadata (8.9 kB) Collecting sqlalchemy-mate<2 Downloading sqlalchemy_mate-1.4.28.4-py2.py3-none-any.whl.metadata (10 kB) Requirement already satisfied: attrs in /usr/local/lib/python3.12/dist-packages (from uszipcode==1.0.1) (25.4.0) Requirement already satisfied: requests in /usr/local/lib/python3.12/dist-packages (from uszipcode==1.0.1) (2.32.4) Collecting pathlib-mate (from uszipcode==1.0.1) Downloading pathlib_mate-1.3.2-py3-none-any.whl.metadata (8.4 kB) Collecting atomicwrites (from uszipcode==1.0.1) Downloading atomicwrites-1.4.1.tar.gz (14 kB) Preparing metadata (setup.py) ... done Collecting fuzzywuzzy (from uszipcode==1.0.1) Downloading fuzzywuzzy-0.18.0-py2.py3-none-any.whl.metadata (4.9 kB) Collecting haversine>=2.5.0 (from uszipcode==1.0.1) Downloading haversine-2.9.0-py2.py3-none-any.whl.metadata (5.8 kB) Requirement already satisfied: SQLAlchemy>=1.4.0 in /usr/local/lib/python3.12/dist-packages (from uszipcode==1.0.1) (2.0.44) Collecting SQLAlchemy>=1.4.0 (from uszipcode==1.0.1) Downloading SQLAlchemy-1.4.54-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (10 kB) Requirement already satisfied: prettytable in /usr/local/lib/python3.12/dist-packages (from sqlalchemy-mate<2) (3.16.0) Requirement already satisfied: greenlet!=0.4.17 in /usr/local/lib/python3.12/dist-packages (from SQLAlchemy>=1.4.0->uszipcode==1.0.1) (3.2.4) Requirement already satisfied: wcwidth in /usr/local/lib/python3.12/dist-packages (from prettytable->sqlalchemy-mate<2) (0.2.14) Requirement already satisfied: charset_normalizer<4,>=2 in /usr/local/lib/python3.12/dist-packages (from requests->uszipcode==1.0.1) (3.4.4) Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.12/dist-packages (from requests->uszipcode==1.0.1) (3.11) Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.12/dist-packages (from requests->uszipcode==1.0.1) (2.5.0) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.12/dist-packages (from requests->uszipcode==1.0.1) (2025.10.5) Downloading uszipcode-1.0.1-py2.py3-none-any.whl (35 kB) Downloading sqlalchemy_mate-1.4.28.4-py2.py3-none-any.whl (77 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 77.1/77.1 kB 6.9 MB/s eta 0:00:00 Downloading haversine-2.9.0-py2.py3-none-any.whl (7.7 kB) Downloading SQLAlchemy-1.4.54-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.6 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 61.1 MB/s eta 0:00:00 Downloading fuzzywuzzy-0.18.0-py2.py3-none-any.whl (18 kB) Downloading pathlib_mate-1.3.2-py3-none-any.whl (56 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 57.0/57.0 kB 5.4 MB/s eta 0:00:00 Building wheels for collected packages: atomicwrites Building wheel for atomicwrites (setup.py) ... done Created wheel for atomicwrites: filename=atomicwrites-1.4.1-py2.py3-none-any.whl size=6943 sha256=7ba518d744d6ae2d83ef59ad0e59d1c49d7ee6dffebd8a867cdcf93becdf8912 Stored in directory: /root/.cache/pip/wheels/6b/37/a4/ae30755673c2d1e07228f13b4491fcaef62438f771d5012d07 Successfully built atomicwrites Installing collected packages: fuzzywuzzy, SQLAlchemy, pathlib-mate, haversine, atomicwrites, sqlalchemy-mate, uszipcode Attempting uninstall: SQLAlchemy Found existing installation: SQLAlchemy 2.0.44 Uninstalling SQLAlchemy-2.0.44: Successfully uninstalled SQLAlchemy-2.0.44 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. google-adk 1.17.0 requires sqlalchemy<3.0.0,>=2.0, but you have sqlalchemy 1.4.54 which is incompatible. ipython-sql 0.5.0 requires sqlalchemy>=2.0, but you have sqlalchemy 1.4.54 which is incompatible. Successfully installed SQLAlchemy-1.4.54 atomicwrites-1.4.1 fuzzywuzzy-0.18.0 haversine-2.9.0 pathlib-mate-1.3.2 sqlalchemy-mate-1.4.28.4 uszipcode-1.0.1
!apt-get install uszipcode
Reading package lists... Done Building dependency tree... Done Reading state information... Done E: Unable to locate package uszipcode
Loading the dataset¶
#Loading the data drive from google drive
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
#Loading data
data = pd.read_csv ('/content/drive/My Drive/UTA - AIML/Loan_Modelling.csv')
# copying data to another variable to avoid any changes to original data
loan = data.copy()
Data Overview¶
- Observations
- Sanity checks
# Display 10 random sample rows of the dataset
loan.sample(10)
| ID | Age | Experience | Income | ZIPCode | Family | CCAvg | Education | Mortgage | Personal_Loan | Securities_Account | CD_Account | Online | CreditCard | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2952 | 2953 | 33 | 8 | 182 | 94065 | 1 | 8.6 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1085 | 1086 | 51 | 26 | 11 | 92612 | 2 | 0.0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 |
| 3854 | 3855 | 31 | 6 | 83 | 94720 | 4 | 1.8 | 3 | 0 | 0 | 0 | 0 | 1 | 0 |
| 4954 | 4955 | 45 | 19 | 22 | 94904 | 3 | 1.5 | 1 | 0 | 0 | 0 | 0 | 1 | 1 |
| 4950 | 4951 | 47 | 23 | 19 | 90089 | 1 | 1.0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 |
| 1959 | 1960 | 50 | 24 | 130 | 95833 | 1 | 1.0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3738 | 3739 | 54 | 28 | 45 | 95008 | 3 | 1.4 | 1 | 0 | 0 | 0 | 0 | 0 | 1 |
| 1905 | 1906 | 25 | -1 | 112 | 92507 | 2 | 2.0 | 1 | 241 | 0 | 0 | 0 | 1 | 0 |
| 4143 | 4144 | 55 | 31 | 20 | 94720 | 2 | 0.3 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
| 1543 | 1544 | 52 | 26 | 101 | 93407 | 2 | 2.4 | 2 | 0 | 0 | 0 | 0 | 1 | 0 |
# we will drop the ID Columns as it does not add any predictibility value
loan.drop("ID", axis=1, inplace=True)
# Display 5 random sample rows of the dataset to check dropping of ID column
loan.sample(5)
| Age | Experience | Income | ZIPCode | Family | CCAvg | Education | Mortgage | Personal_Loan | Securities_Account | CD_Account | Online | CreditCard | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4989 | 24 | 0 | 38 | 93555 | 1 | 1.0 | 3 | 0 | 0 | 0 | 0 | 1 | 0 |
| 3653 | 52 | 27 | 32 | 92521 | 2 | 2.0 | 2 | 0 | 0 | 0 | 0 | 0 | 1 |
| 2579 | 52 | 27 | 23 | 92780 | 1 | 0.4 | 3 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1544 | 39 | 15 | 24 | 92123 | 1 | 1.0 | 1 | 116 | 0 | 0 | 0 | 1 | 1 |
| 2681 | 37 | 11 | 35 | 94609 | 2 | 0.8 | 3 | 0 | 0 | 0 | 0 | 0 | 0 |
# viewing the shape of the data set
loan.shape
(5000, 13)
#viewing the dataset attributes
loan.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 5000 entries, 0 to 4999 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Age 5000 non-null int64 1 Experience 5000 non-null int64 2 Income 5000 non-null int64 3 ZIPCode 5000 non-null int64 4 Family 5000 non-null int64 5 CCAvg 5000 non-null float64 6 Education 5000 non-null int64 7 Mortgage 5000 non-null int64 8 Personal_Loan 5000 non-null int64 9 Securities_Account 5000 non-null int64 10 CD_Account 5000 non-null int64 11 Online 5000 non-null int64 12 CreditCard 5000 non-null int64 dtypes: float64(1), int64(12) memory usage: 507.9 KB
All variable types are int except CCAvg is float, hence let us display the 5-point summary of the data
#Checking for null and duplicate values
loan.isnull().sum()
| 0 | |
|---|---|
| Age | 0 |
| Experience | 0 |
| Income | 0 |
| ZIPCode | 0 |
| Family | 0 |
| CCAvg | 0 |
| Education | 0 |
| Mortgage | 0 |
| Personal_Loan | 0 |
| Securities_Account | 0 |
| CD_Account | 0 |
| Online | 0 |
| CreditCard | 0 |
No Null values in the dataset
#checking for duplicate values
loan.duplicated().sum()
np.int64(0)
No duplicated values in the dataset
# viewing the variables datatypes
loan.dtypes
| 0 | |
|---|---|
| Age | int64 |
| Experience | int64 |
| Income | int64 |
| ZIPCode | int64 |
| Family | int64 |
| CCAvg | float64 |
| Education | int64 |
| Mortgage | int64 |
| Personal_Loan | int64 |
| Securities_Account | int64 |
| CD_Account | int64 |
| Online | int64 |
| CreditCard | int64 |
#checking for 5 point summary
loan.describe().T
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| Age | 5000.0 | 45.338400 | 11.463166 | 23.0 | 35.0 | 45.0 | 55.0 | 67.0 |
| Experience | 5000.0 | 20.104600 | 11.467954 | -3.0 | 10.0 | 20.0 | 30.0 | 43.0 |
| Income | 5000.0 | 73.774200 | 46.033729 | 8.0 | 39.0 | 64.0 | 98.0 | 224.0 |
| ZIPCode | 5000.0 | 93169.257000 | 1759.455086 | 90005.0 | 91911.0 | 93437.0 | 94608.0 | 96651.0 |
| Family | 5000.0 | 2.396400 | 1.147663 | 1.0 | 1.0 | 2.0 | 3.0 | 4.0 |
| CCAvg | 5000.0 | 1.937938 | 1.747659 | 0.0 | 0.7 | 1.5 | 2.5 | 10.0 |
| Education | 5000.0 | 1.881000 | 0.839869 | 1.0 | 1.0 | 2.0 | 3.0 | 3.0 |
| Mortgage | 5000.0 | 56.498800 | 101.713802 | 0.0 | 0.0 | 0.0 | 101.0 | 635.0 |
| Personal_Loan | 5000.0 | 0.096000 | 0.294621 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
| Securities_Account | 5000.0 | 0.104400 | 0.305809 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
| CD_Account | 5000.0 | 0.060400 | 0.238250 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
| Online | 5000.0 | 0.596800 | 0.490589 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 |
| CreditCard | 5000.0 | 0.294000 | 0.455637 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 |
Observation:
- The min Age is 23 , max is 67 , Average is 45 Years old - Data seems reasonable and normaly distributed
- The min Experience is -3 years which does not seem reasonable, hence requires a closer look
- The min Income is USD 46k, max is USD 224K and average is USD 64K - Dataset seems to be right skewed
- The Zip codes shall not be treated as an integer value as it reflects location - Requires data pre-processing
- The min Family size is 1, max is 4 and average is 2 - Dataset seems reasonable and almost uniformly distributed and can be treated as a categorical variable as it holds only 4 values
- The min CCAvg is USD 0k (which can be reflecting customers who do not own credit cards), max is USD 1.9K and average is USD 10K - Dataset seems reasonable and right skewed
- The Education is a categorical variable where 1: Undergrad; 2: Graduate;3: Advanced/Professional
- The min Mortgage is USD 0k, max is USD 635K and average is USD 0K - Dataset heavily right skewed
- The Personal_Loan is a categorical variable where 1: customer accepted the personal loan offered in the last campaign and 0:customer didnot accept
- The Securities_Account is a categorical variable where 1: customer has Securities_Account 0:customer does not have Securities_Account
- The CD_Account is a categorical variable where 1: customer has CD_Account 0:customer does not have CD_Account
- The Online is a categorical variable where 1: customer uses online banking 0:customer does use online banking
- The CreditCard is a categorical variable where 1: customer use a credit card issued by any other Bank 0:customer does not use a credit card issued by any other Bank
Data Preprocessing¶
- Missing value treatment
- Feature engineering (if needed)
- Outlier detection and treatment (if needed)
- Preparing data for modeling
- Any other preprocessing steps (if needed)
1. Experience column (Treating the -ve value rows)
# displaying how many rows carries a negative value
print(f'There are {len(loan[loan["Experience"] < 0])} rows with a negative value')
There are 52 rows with a negative value
# let us plot the distribution of the Experience variable and view the skeweness (if any) to decide on the best amputation approach
sns.displot(x=loan["Experience"], kde=True)
<seaborn.axisgrid.FacetGrid at 0x7d0ed985e780>
The data distribution is roughly taking the shape of a uniformly distributed curve. Accordingly, the approach to amputate the -ve values is by setting them equal to the median
#applying numpy data attributed to experience column
loan["Experience"] = loan["Experience"].apply(
lambda x: loan["Experience"].median() if x < 0 else x
)
# checking the value counts to confirm amputation is successful
len(loan[loan["Experience"] < 0])
0
No more negative values in the variable Experience
2. Zip code conversion
Using the "uszipcode" library, we extract the "City" and "State" from the zipcode of every customer
#import the searchengine from uszipcode library
import uszipcode
from uszipcode import SearchEngine
#create the search engine
search = SearchEngine()
#create a function to get the city from the zipcode
def get_city(x):
return search.by_zipcode(x).city
#create a function to get the state from the zipcode
def get_state(x):
return search.by_zipcode(x).state
Download /root/.uszipcode/simple_db.sqlite from https://github.com/MacHu-GWU/uszipcode-project/releases/download/1.0.1.db/simple_db.sqlite ... 1.00 MB downloaded ... 2.00 MB downloaded ... 3.00 MB downloaded ... 4.00 MB downloaded ... 5.00 MB downloaded ... 6.00 MB downloaded ... 7.00 MB downloaded ... 8.00 MB downloaded ... 9.00 MB downloaded ... 10.00 MB downloaded ... 11.00 MB downloaded ... Complete!
#create 2 empty lists to fill with city and state for each customer
the_city_ = []
the_state_=[]
#create a for loop to loop on the zipcodes, extract the city and state and fill the lists
for i in np.arange(0,5000):
try:
city = get_city(loan["ZIPCode"].iloc[i])
the_city_.append(city)
state = get_state(loan["ZIPCode"].iloc[i])
the_state_.append(state)
except:
the_city_.append(np.nan)
the_state_.append(np.nan)
continue
#Adding the city and state lists as new columns in the dataset
loan['City'] = the_city_
loan['State'] = the_state_
loan.head()
| Age | Experience | Income | ZIPCode | Family | CCAvg | Education | Mortgage | Personal_Loan | Securities_Account | CD_Account | Online | CreditCard | City | State | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 25 | 1.0 | 49 | 91107 | 4 | 1.6 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | Pasadena | CA |
| 1 | 45 | 19.0 | 34 | 90089 | 3 | 1.5 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | Los Angeles | CA |
| 2 | 39 | 15.0 | 11 | 94720 | 1 | 1.0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | Berkeley | CA |
| 3 | 35 | 9.0 | 100 | 94112 | 1 | 2.7 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | San Francisco | CA |
| 4 | 35 | 8.0 | 45 | 91330 | 4 | 1.0 | 2 | 0 | 0 | 0 | 0 | 0 | 1 | Northridge | CA |
#explorying the null values added in the dataframe from the city and state lists
loan['City'].isnull().value_counts()
| count | |
|---|---|
| City | |
| False | 4966 |
| True | 34 |
loan['State'].isnull().value_counts()
| count | |
|---|---|
| State | |
| False | 4966 |
| True | 34 |
34 values are missing
#extracting the zipcodes that are returning the nan values
zip_nan = loan[loan['City'].isnull()]
zip_nan['ZIPCode'].value_counts()
| count | |
|---|---|
| ZIPCode | |
| 92717 | 22 |
| 96651 | 6 |
| 92634 | 5 |
| 93077 | 1 |
There are 4 unique Zip codes reflecting as NAN in the city column.
# Checking the same for the state column
zip_nan_state = loan[loan['State'].isnull()]
zip_nan_state['ZIPCode'].value_counts()
| count | |
|---|---|
| ZIPCode | |
| 92717 | 22 |
| 96651 | 6 |
| 92634 | 5 |
| 93077 | 1 |
The missing city and state values share the same ZIP codes. I will manually lookup for the 4 unique codes on google and then replace the NAN with the actual value
#Create a dictionary with the Zip code googled City and State
zip_dict = {'92717':'Irvine, CA',
'96651':'Rudno nad Hronom, BC',
'92634':'Fullerton, CA',
'93077':'Ventura, CA'
}
#Create a function to fill the missing values
def fill_nan(data, indxs, value, column):
for i in indxs:
data[column].iloc[i]=value
#Create a for loop to fill in the missing city and state values
for i in zip_dict.keys():
indxs = loan[loan['ZIPCode']==int(i)].index
fill_nan(loan, indxs, zip_dict[str(i)].split(',')[0], 'City')
fill_nan(loan, indxs, zip_dict[str(i)].split(',')[1], 'State')
#confirm null values are removed in City and State columns
loan.isnull().sum()
| 0 | |
|---|---|
| Age | 0 |
| Experience | 0 |
| Income | 0 |
| ZIPCode | 0 |
| Family | 0 |
| CCAvg | 0 |
| Education | 0 |
| Mortgage | 0 |
| Personal_Loan | 0 |
| Securities_Account | 0 |
| CD_Account | 0 |
| Online | 0 |
| CreditCard | 0 |
| City | 0 |
| State | 0 |
#Display value counts of states
loan['State'].value_counts()
| count | |
|---|---|
| State | |
| CA | 4966 |
| CA | 28 |
| BC | 6 |
There are two main states CA (majority) and BC, yet "CA" and " CA" can be merged and " BC" to "BC" for a better homoginity
loan['State'].replace(' CA','CA',inplace=True)
loan['State'].replace(' BC','BC',inplace=True)
loan['State'].value_counts()
| count | |
|---|---|
| State | |
| CA | 4994 |
| BC | 6 |
#Display value counts of Cities
loan['City'].value_counts()
| count | |
|---|---|
| City | |
| Los Angeles | 375 |
| San Diego | 269 |
| San Francisco | 257 |
| Berkeley | 241 |
| Sacramento | 148 |
| ... | ... |
| Sierra Madre | 1 |
| Ladera Ranch | 1 |
| Sausalito | 1 |
| Tahoe City | 1 |
| Stinson Beach | 1 |
245 rows × 1 columns
There are 245 different cities in the dataset, most customers are from Los Angeles, San Diego, San Francisco, Berkeley and Sacramento
Finally, we drop the ZIP Code column
loan.drop('ZIPCode', axis =1, inplace=True)
Exploratory Data Analysis.¶
- EDA is an important part of any project involving data.
- It is important to investigate and understand the data better before building a model with it.
- A few questions have been mentioned below which will help you approach the analysis in the right manner and generate insights from the data.
- A thorough analysis of the data, in addition to the questions mentioned below, should be done.
Questions:
- What is the distribution of mortgage attribute? Are there any noticeable patterns or outliers in the distribution?
- How many customers have credit cards?
- What are the attributes that have a strong correlation with the target attribute (personal loan)?
- How does a customer's interest in purchasing a loan vary with their age?
- How does a customer's interest in purchasing a loan vary with their education?
1. Univariate analysis
i. Visualizing the numerical data¶
From the 5 point summary, it is observed that Age, Experience, Income, CCAvg and Mortgage are numerical and continous in nature, hence we will define and apply a function to plot the histogram & boxplot for each variable
#we will define a function to plot the boxplot and histogram for all numerical variables
def histogram_boxplot(data, feature, figsize=(12, 7), kde=False, bins=None):
"""
Boxplot and histogram combined
data: dataframe
feature: dataframe column
figsize: size of figure (default (12,7))
kde: whether to the show density curve (default False)
bins: number of bins for histogram (default None)
"""
f2, (ax_box2, ax_hist2) = plt.subplots(
nrows=2, # Number of rows of the subplot grid= 2
sharex=True, # x-axis will be shared among all subplots
gridspec_kw={"height_ratios": (0.25, 0.75)},
figsize=figsize,
) # creating the 2 subplots
sns.boxplot(
data=data, x=feature, ax=ax_box2, showmeans=True, color="violet"
) # boxplot will be created and a star will indicate the mean value of the column
sns.histplot(
data=data, x=feature, kde=kde, ax=ax_hist2, bins=bins, palette="winter"
) if bins else sns.histplot(
data=data, x=feature, kde=kde, ax=ax_hist2
) # For histogram
ax_hist2.axvline(
data[feature].mean(), color="green", linestyle="--"
) # Add mean to the histogram
ax_hist2.axvline(
data[feature].median(), color="black", linestyle="-"
) # Add median to the histogram
Age¶
histogram_boxplot(loan,'Age')
This answer the question 4 in EDA section of the project¶
Observations:
The min of age is 23, while the max is 67. Average age is 45 Years old
The data looks slightly fit as a uniform distribution.
The maximum number of clients is within the 58-60 years old range, there are also peak counts at 30-32, 38-40, 44-46 and 52-54 years old
There are no outliers observed.
Experience¶
histogram_boxplot(loan,'Experience')
Observation
The min Experience is 0 year, while the max is 43 years and the mean is approximatly 20 years.
The data is almost fitting as a uniform distribution with peaks at 12-14 years and 28-30 years
There are no outliers observed
Income¶
histogram_boxplot(loan,'Income')
Observation
The min Income is USD46000, while the max income is USD224000 and average is USD64000
Dataset is rightly skewed
There is a notifiabe number of outliers, but it seem consistent with the provided data
No action may be required for the outlier treatment
CCAvg - Credit Card Avg¶
histogram_boxplot(loan,'CCAvg')
Observation
The min CCAvg is USD 0
This may reflect customers who do not to have credit cards)
The max CCAvg is USD1900 and average is approx USD 1900
Dataset is skewed to the right with a number of outliers that is homogenous with the provided data
No action is recommended to the outliers
Mortgage¶
histogram_boxplot(loan,'Mortgage')
Observation
The min Mortgage is USD 0, while the max is USD635000 and average is USD 0
Dataset heavily skewed to the right
It would be better to separate the USD 0 mortgage from the >USD0 mortgage values in order to be abke to visualize it better and to plot the data again
#extracting the customers with mortgage values > 0
mortgage = loan[loan['Mortgage']>0]
print(f'There are {len(mortgage)} customers under mortgage and forms {round((len(mortgage)/5000)*100)}% of the dataset')
There are 1538 customers under mortgage and forms 31% of the dataset
#plotting mortgage of the customers again
histogram_boxplot(mortgage,'Mortgage')
This answer the question 1 in EDA section of the project¶
Observation
The mortgage distibution of the customers under mortgage is skewed to the right with a minimum value of approx 99000 and to max value of 635000 and mean value between USD 180000 to 200000. Better than the previous visualization.
ii. Visualizing the categorical data¶
From the 5 point summary, it is observed that Zip codes, Family size, Education, Personal_Loan, CD_Account, Online and CreditCard are categorical in nature, hence we will define and apply a function to plot a labelled barplot for each variable
# function to create labeled barplots
def labeled_barplot(data, feature, perc=False, n=None):
"""
Barplot with percentage at the top
data: dataframe
feature: dataframe column
perc: whether to display percentages instead of count (default is False)
n: displays the top n category levels (default is None, i.e., display all levels)
"""
total = len(data[feature]) # length of the column
count = data[feature].nunique()
if n is None:
plt.figure(figsize=(count + 2, 6))
else:
plt.figure(figsize=(n + 2, 6))
plt.xticks(rotation=90, fontsize=15)
ax = sns.countplot(
data=data,
x=feature,
palette="Paired",
order=data[feature].value_counts().index[:n].sort_values(),
)
for p in ax.patches:
if perc == True:
label = "{:.1f}%".format(
100 * p.get_height() / total
) # percentage of each class of the category
else:
label = p.get_height() # count of each level of the category
x = p.get_x() + p.get_width() / 2 # width of the plot
y = p.get_height() # height of the plot
ax.annotate(
label,
(x, y),
ha="center",
va="center",
size=12,
xytext=(0, 5),
textcoords="offset points",
) # annotate the percentage
plt.show() # show the plot
Cities Zip Codes¶
plt.figure(figsize=(15,45))
sns.countplot(data=loan, y='City', order=loan['City'].value_counts().index)
<Axes: xlabel='count', ylabel='City'>
Observation
Approx. 25.8% of the customers in the dataset are residing in Los Angeles, San Diego, San Fransisco, Berkeley and Sacramento (these are the top 5 cities in the dataset)
The top city is Los Angeles where 7.5% of the customers resides
Overview of the city
City |No. of customers|
- Los Angeles | 375|
- San Diego | 269|
- San Francisco | 257|
- Berkeley | 241|
- Sacramento | 148|
Family size¶
labeled_barplot(loan,'Family',perc=True)
Observation
The majority of Family sizes (29.4%) is size 1
Followed by (25.9%) of size 2
The 3rd family size is Size 4 which is (24.4%)
Finally, family size 3 have (20.2%) in the dataset
Education¶
labeled_barplot(loan,'Education',perc=True)
This answer the question 5 in EDA section of the project¶
Observation
41.9% of customers are 1: Undergrad
28.1% of customers are 2: Graduate
30.0% of customers are 3: Advanced/Professional
Personal Loan¶
labeled_barplot(loan,'Personal_Loan',perc=True)
This answer the question 3 in EDA section of the project¶
Observation
90.4% of customers Did not accept a loan
9.6% of customers accepted a loan
This need more work to find out the difference of the 0.6% mismatch
CD_Account¶
labeled_barplot(loan,'CD_Account',perc=True)
Observation
94% of customers Do Not have a CD_Account
6% only have a CD_Account
Online¶
labeled_barplot(loan,'Online',perc=True)
Observation
59.7% of customers use the online banking services
40.3% of customers do not use the online banking services
CreditCard¶
labeled_barplot(loan,'CreditCard',perc=True)
This answer the question 2 in EDA section of the project¶
Observation
70.6% of customers do not use a credit card issued by a different bank
29.4% of customers use a credit card issued by a different bank
2. Bivariate analysis
I will like to start with a simple pair plot in order to view if any correlation between the data set variables will occur
plt.figure(figsize=(15,7))
# sns.pairplot(loan, diag_kind='kde')
sns.pairplot(loan, hue="Personal_Loan")
plt.show()
<Figure size 1500x700 with 0 Axes>
Observations:
The orange spots in the plot represent the customers who accepted a personal loan and the blue spots show the ones who did not.
a. From the univariate analysis of the personal loan dataset done above, it was observed that only 9.6% of the customers accepted the personal loan, hence this observation is consistent with the pair plot where the majority of the spots are blue
b. It is observed that there is a very strong linear corelation between Age and Experience in the dataset
c. It is also observed that there is slight correlation between Income and CCAvg as shown in the pair plot above
d. High concentraion of customers who accepted a personal loan are observed at the following points in the dataset:
- Higher Income level (starting at approx USD100000 and above)
- Higher CCAvg (starting at approx USD3000 and above)
- Higher Mortgage value (starting at approx USD300000 and above)
- Customers with CD_Accounts in the dataset
- Customers who use credict cards issued from other banks other than the bank under review
- Famillies with sizes 3 and 4
- Customers with Eductaion 2-graduate and 3-advanced/professional
I think an idea of what to expect in the above variables which have from medium to high prediction power on the classification models is very essential.
plt.figure(figsize=(15,10))
sns.heatmap(loan.select_dtypes(include=np.number).corr(),annot=True,cmap='YlGnBu')
<Axes: >
Observation
The following heatmap observation above is not very different from what has been seen in the pairplot above, which are;
The correlation between Age and Experience is very high (0.98)
The correlation between Income and CCAvg is lower (0.65)
All other correlation values are quite small for our further consideration
Income and mortgage is also positively correlated but not very high, its below 25% of correllation.
# Write the code here
plt.figure(figsize=(10,5))
plt.title('CountPlot:Education Category who have taken Personal Loans')
sns.countplot(loan, x="Personal_Loan", hue='Education')
plt.show()
Observation
The following heatmap observation above is not very different from what has been seen in the pairplot above, which are;
The correlation between Personal loan and Education is inverse, the higher the education of customers the lower the willingness to take personal loan
People with higher education tend not to take personal loans
import plotly.express as px
Figure_3D=px.scatter_3d(loan,x='Personal_Loan', y='Age',z='Income',color='Age');
Figure_3D.show()
Figure_3D.write_html('/content/drive/My Drive/Addedum image to project.html')
Observation
The above plotly image is a 3-D immage which tend to explain the correlation between age, income and the ability or willingness of custimers to accept personal loan
NB: To see this 3-D image, please see the Addendum 1 uploaded as a separate html file.
### function to plot distributions wrt our target variables
def distribution_plot_wrt_target(data, predictor, target):
fig, axs = plt.subplots(2, 2, figsize=(10, 7))
target_uniq = data[target].unique()
axs[0, 0].set_title("Distribution of Target for target=" + str(target_uniq[0]))
sns.histplot(
data=data[data[target] == target_uniq[0]],
x=predictor,
kde=True,
ax=axs[0, 0],
color="teal",
stat="density",
)
axs[0, 1].set_title("Distribution of Target for target=" + str(target_uniq[1]))
sns.histplot(
data=data[data[target] == target_uniq[1]],
x=predictor,
kde=True,
ax=axs[0, 1],
color="orange",
stat="density",
)
axs[1, 0].set_title("Boxplot w.r.t Target")
sns.boxplot(data=data, x=target, y=predictor, ax=axs[1, 0], palette="gist_rainbow")
axs[1, 1].set_title("Boxplot (without outliers) w.r.t Target")
sns.boxplot(
data=data,
x=target,
y=predictor,
ax=axs[1, 1],
showfliers=False,
palette="gist_rainbow",
)
plt.tight_layout()
plt.show()
#checking the columns heads in the dataframe
loan.columns
Index(['Age', 'Experience', 'Income', 'Family', 'CCAvg', 'Education',
'Mortgage', 'Personal_Loan', 'Securities_Account', 'CD_Account',
'Online', 'CreditCard', 'City', 'State'],
dtype='object')
Age Vs Personal Loan¶
distribution_plot_wrt_target(loan,'Age','Personal_Loan')
Observation
The mean age for customers accepting or not accepting personal loans is very close to the value 45 years of age.
As seen from the pair plot Age, it does not have relation with our target variable, so it has a small prediction power
Experience Vs Personal Loan¶
distribution_plot_wrt_target(loan,'Experience','Personal_Loan')
Observation
The mean Experience for customers accepting or not accepting personal loans is very close to the value 20 years of experience
As observed in the pair plot, Experience does not have relation with our target variable, so it has a small prediction power
Income Vs Personal Loan¶
distribution_plot_wrt_target(loan,'Income','Personal_Loan')
Observations:
As I observed before, the Income variable varies greatly between customers who accepted and who does not accept personal loans.
The mean income for the customers who did not accept the loan is approx USD65000
While the mean income for the customers who accepted the loan is approx USD145000
Therefore, the income level of the customers has high impact on the customer's decision to take personal loan
The higher the income, the more chances the customer will accept a personal loan
Mortgage Vs Personal Loan¶
distribution_plot_wrt_target(loan,'Mortgage','Personal_Loan')
Observation
Customers paying mortgage are more likely to accept personal loan
Variable distribution is heavily right skewed due to the high number of customers not paying mortgage
In view of the above point (2), its inevitable to plot the distribution of customers paying mortgage only which are shown in the dataframe
This should provide clearer and more in-depth insight into the mortgage mean value for customers accepting and those custmers who does not accept a personal loan.
#distribution of customers paying mortgage only
distribution_plot_wrt_target(mortgage,'Mortgage','Personal_Loan')
Observation
The mean value of mortgage paid by customers not accepting a personal loan is approx USD150000 USD which is much less than the mean value of customers accepting personal loan of approx USD290000
As its shown now, the higher the mortgage value increase, the more likely it is for a customer to accept a personal loan.
CCAvg Vs Personal Loan¶
distribution_plot_wrt_target(loan,'CCAvg','Personal_Loan')
***Observation**
The mean value of CCAvg of customers not accepting a personal loan is approx USD 1600 which is much less than the mean value of customers accepting personal loan of approx USD4000.
Customers who accept personal loan spends more money on a monthly basis
b. Plotting distribution of categorical variables Vs Target variable (Personal Loan)¶
# function to plot stacked bar chart
def stacked_barplot(data, predictor, target):
"""
Print the category counts and plot a stacked bar chart
data: dataframe
predictor: independent variable
target: target variable
"""
count = data[predictor].nunique()
sorter = data[target].value_counts().index[-1]
tab1 = pd.crosstab(data[predictor], data[target], margins=True).sort_values(
by=sorter, ascending=False
)
print(tab1)
print("-" * 120)
tab = pd.crosstab(data[predictor], data[target], normalize="index").sort_values(
by=sorter, ascending=False
)
tab.plot(kind="bar", stacked=True, figsize=(count + 5, 6))
plt.legend(
loc="lower left", frameon=False,
)
plt.legend(loc="upper left", bbox_to_anchor=(1, 1))
plt.show()
Family Vs Personal Loan¶
stacked_barplot(loan,'Family','Personal_Loan')
Personal_Loan 0 1 All Family All 4520 480 5000 4 1088 134 1222 3 877 133 1010 1 1365 107 1472 2 1190 106 1296 ------------------------------------------------------------------------------------------------------------------------
Observation
Famillies of size 3 or 4 have more propensity to accept a personal loan
Hence, we can conclude that as the familly size grows, customers would be more willing to accept personal loans
Education Vs Personal Loan¶
stacked_barplot(loan,'Education','Personal_Loan')
Personal_Loan 0 1 All Education All 4520 480 5000 3 1296 205 1501 2 1221 182 1403 1 2003 93 2096 ------------------------------------------------------------------------------------------------------------------------
Observation
Customers with education levels 2 and 3 are more willing to accept a personal loan than customers with education level 1
The highest being customers with education level 3
Securities Account Vs Personal Loan¶
stacked_barplot(loan,'Securities_Account','Personal_Loan')
Personal_Loan 0 1 All Securities_Account All 4520 480 5000 0 4058 420 4478 1 462 60 522 ------------------------------------------------------------------------------------------------------------------------
Observation
- Customers with Security account are more willing to accept a personal loan
CD_Account Vs Personal Loan¶
stacked_barplot(loan,'CD_Account','Personal_Loan')
Personal_Loan 0 1 All CD_Account All 4520 480 5000 0 4358 340 4698 1 162 140 302 ------------------------------------------------------------------------------------------------------------------------
Observation
- Customers with CD_account are more willing to accept a personal loan
Online Vs Personal Loan¶
stacked_barplot(loan,'Online','Personal_Loan')
Personal_Loan 0 1 All Online All 4520 480 5000 1 2693 291 2984 0 1827 189 2016 ------------------------------------------------------------------------------------------------------------------------
Observation
Customers who use iternet banking services and who doesn't are are not different from each other.
The possibility of accepting personal loans or not accepting it is almost likely the same as its shown in the plot above.
CreditCard Vs Personal Loan¶
stacked_barplot(loan,'CreditCard','Personal_Loan')
Personal_Loan 0 1 All CreditCard All 4520 480 5000 0 3193 337 3530 1 1327 143 1470 ------------------------------------------------------------------------------------------------------------------------
Observation
- Customers who use credit cards issued by other banks and those who does not are almost equally likely to accept personal loans
City Vs Personal Loan¶
For a better visualization, I will assign a threshold in order to be able to pick the most common cities between the customers and and to assign the remaining to others.
The resultant dataframe created will be used to visualize how most common cities would vary with the target - personal loan.
#assigning the threshold of 50
cities = loan['City'].value_counts()
threshold = 50
cities[cities.values >= threshold]
| count | |
|---|---|
| City | |
| Los Angeles | 375 |
| San Diego | 269 |
| San Francisco | 257 |
| Berkeley | 241 |
| Sacramento | 148 |
| Palo Alto | 130 |
| Stanford | 127 |
| Davis | 121 |
| La Jolla | 112 |
| Santa Barbara | 103 |
| San Jose | 96 |
| Irvine | 80 |
| Santa Clara | 77 |
| Monterey | 72 |
| Pasadena | 71 |
| Oakland | 55 |
| Newbury Park | 53 |
| Claremont | 52 |
| Menlo Park | 52 |
| Santa Cruz | 51 |
| El Segundo | 50 |
#the threshhold of 50 looks appropriate, so lets extract the names of the cities
cities_list = cities[cities.values >= threshold].index.tolist()
print("Cities names taken into consideration:\n", len(cities_list))
print(cities_list)
Cities names taken into consideration: 21 ['Los Angeles', 'San Diego', 'San Francisco', 'Berkeley', 'Sacramento', 'Palo Alto', 'Stanford', 'Davis', 'La Jolla', 'Santa Barbara', 'San Jose', 'Irvine', 'Santa Clara', 'Monterey', 'Pasadena', 'Oakland', 'Newbury Park', 'Claremont', 'Menlo Park', 'Santa Cruz', 'El Segundo']
#we create a copy of the created data frame as we are doing this transformation only for better visualization
loan_t = loan.copy()
loan_t['City'] = loan_t['City'].apply(lambda x:x if x in cities_list else 'others')
#function to plot horizontal stacked bar chart
def stacked_barplot(data, predictor, target):
"""
Print the category counts and plot a stacked bar chart
data: dataframe
predictor: independent variable
target: target variable
"""
count = data[predictor].nunique()
sorter = data[target].value_counts().index[-1]
tab1 = pd.crosstab(data[predictor], data[target], margins=True).sort_values(
by=sorter, ascending=False
)
print(tab1)
print("-" * 120)
tab = pd.crosstab(data[predictor], data[target], normalize="index").sort_values(
by=sorter, ascending=False
)
tab.plot(kind="barh", stacked=True, figsize=(count + 5, 9))
plt.legend(
loc="lower left", frameon=False,
)
plt.legend(loc="upper left", bbox_to_anchor=(1, 1))
plt.show()
#plotting a horizonal stacked bar plot of the transformed dataset
plt.figure(figsize=(10,40))
stacked_barplot(loan_t,'City','Personal_Loan')
Personal_Loan 0 1 All City All 4520 480 5000 others 2181 227 2408 Los Angeles 337 38 375 Berkeley 214 27 241 San Diego 248 21 269 San Francisco 238 19 257 Palo Alto 114 16 130 La Jolla 97 15 112 Stanford 114 13 127 Sacramento 135 13 148 Santa Clara 65 12 77 San Jose 85 11 96 Irvine 69 11 80 Pasadena 61 10 71 Santa Barbara 95 8 103 Santa Cruz 44 7 51 Monterey 66 6 72 Davis 115 6 121 El Segundo 45 5 50 Oakland 50 5 55 Claremont 48 4 52 Menlo Park 48 4 52 Newbury Park 51 2 53 ------------------------------------------------------------------------------------------------------------------------
<Figure size 1000x4000 with 0 Axes>
Observation
Santa Clara shows the greatest ratio of customers willing to accept personal loans
Pasadena is the second city that show greater ration of customers who are willing to accept personal loans
Irvine is the third city that shows great ratio of customers willing to accept personal loans
State Vs Personal_Loan¶
stacked_barplot(loan,'State','Personal_Loan')
Personal_Loan 0 1 All State CA 4514 480 4994 All 4520 480 5000 BC 6 0 6 ------------------------------------------------------------------------------------------------------------------------
Observations
- Customers living in the State of California USA are more willing to accept personal loans than the customers in British Columbia Canada
Key Insights based on EDA¶
The Five point summary for all the variables including the city and state added columns:
loan.describe(include='all').T
| count | unique | top | freq | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Age | 5000.0 | NaN | NaN | NaN | 45.3384 | 11.463166 | 23.0 | 35.0 | 45.0 | 55.0 | 67.0 |
| Experience | 5000.0 | NaN | NaN | NaN | 20.3276 | 11.253035 | 0.0 | 11.0 | 20.0 | 30.0 | 43.0 |
| Income | 5000.0 | NaN | NaN | NaN | 73.7742 | 46.033729 | 8.0 | 39.0 | 64.0 | 98.0 | 224.0 |
| Family | 5000.0 | NaN | NaN | NaN | 2.3964 | 1.147663 | 1.0 | 1.0 | 2.0 | 3.0 | 4.0 |
| CCAvg | 5000.0 | NaN | NaN | NaN | 1.937938 | 1.747659 | 0.0 | 0.7 | 1.5 | 2.5 | 10.0 |
| Education | 5000.0 | NaN | NaN | NaN | 1.881 | 0.839869 | 1.0 | 1.0 | 2.0 | 3.0 | 3.0 |
| Mortgage | 5000.0 | NaN | NaN | NaN | 56.4988 | 101.713802 | 0.0 | 0.0 | 0.0 | 101.0 | 635.0 |
| Personal_Loan | 5000.0 | NaN | NaN | NaN | 0.096 | 0.294621 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
| Securities_Account | 5000.0 | NaN | NaN | NaN | 0.1044 | 0.305809 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
| CD_Account | 5000.0 | NaN | NaN | NaN | 0.0604 | 0.23825 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
| Online | 5000.0 | NaN | NaN | NaN | 0.5968 | 0.490589 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 |
| CreditCard | 5000.0 | NaN | NaN | NaN | 0.294 | 0.455637 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 |
| City | 5000 | 245 | Los Angeles | 375 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| State | 5000 | 2 | CA | 4994 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
EDA Final Observations;
Key observations on the univariate analysis:
The min Age is 23 ,max is 67, Average is 45 Years old
Data seems is slightly fitting a uniform distribution.
The min Experience is 0 years, the max is 43 years and the mean is approx. 20 years.
The min CCAvg is USD0 (reflecting customers who do not own credit cards), max is USD1900 and average is USD10000
69% of the customers are not paying mortage and the remainig 31% paying min value of approx USD99000 to max value of USD635000 and mean value between USD180000-200000 for mortgage
The top 5 cities are Los Angeles (1st),San Diego (2nd), San Fransisco (3rd), Berkeley (4th) and Sacramento (5th)
The top State is CA
The majority of Family sizes (29.4%) is size 1, followed by (25.9%) of family size 2 then (24.4%) of size family size 4 and finally (20.2%) of family size 3.
Customer Education is distributed as follow: 41.9% of customers are 1: Undergrad 28.1% of customers are 2: Graduate 30.0% of customers are 3: Advanced/Professional
94% of customers Do Not have a CD_Account and 6% only have a CD_Account
40.3% of customers do not use the online banking services and 59.7% have Online banking services
70.6% of customers do not hold a credit card and 29.4% of customers hold credit cards
The target variable Personal_Loan shows 90.4% of customers Did not accept a loan and 9.6% of customers accepted a loan
Key Insights and Observations on the multivariate analysis:
Correlation between variables
The correlation between Age and Experience is very high (value = 0.98)
The correlation between Income and CCAvg is lower (value = 0.65)
All other correlation values are quiet small for consideration
The effect of variables on the target variable
(Variable) (Effect on Target Variable - Personal Loans)
Income The higher the income, the more chances the customer will accept a personal loan
CCAvg The higher the monthly spending of customers increase, the more likely they are willing to accept personal loan
Education The higher the Education level of the customers, the more willing they are to accept a personal loan
Mortgage The higher the value of mortgage, the higher the customer is more likely to accept a personal loan
City Santa Clara shows the greatest ratio of customers willing to accept personal loans, then Pasadena and lastly, Irviner.
State The customers residing in CA are more willing to accept personal loans
Securities_Acc Customers with a Security account are more willing to accept a personal loan
CD_Account Customers with CD_account are more willing to accept a personal loan
Online No effect on the target variable is being observed on the target variable (Personal loan)
Age No effect on the target variable is being observed on the target variable (Personal loan)
Experience No effect on the target variable is being observed on the target variable (Personal loan)
CreditCard No effect on the target variable is being observed on the target variable (Personal loan)
Expected target dependencies
From Higher to high importance is being expected for these variables: Income, Familly, CCAvg, Education, Mortgage, Security Account and CD_Account
Low importance is expected for these variables: Age, Experience, Credit card and Online
Model Building¶
Data Preparation for modelling¶
Previously applied Data Pre-proccesing actions:
ID column: dropped
Experience column: Negative values replaced by Median
ZIP Code Column: dropped but splitted into City and state
1. Creating the dummy variables¶
#checking columns in the data
loan.columns
Index(['Age', 'Experience', 'Income', 'Family', 'CCAvg', 'Education',
'Mortgage', 'Personal_Loan', 'Securities_Account', 'CD_Account',
'Online', 'CreditCard', 'City', 'State'],
dtype='object')
#Creating a dummy dataframe for the model
df_model = pd.get_dummies(
loan,
columns=[
"Education",
"City",
"State",
],
drop_first=True,
)
df_model.head()
| Age | Experience | Income | Family | CCAvg | Mortgage | Personal_Loan | Securities_Account | CD_Account | Online | CreditCard | Education_2 | Education_3 | City_Alameda | City_Alamo | City_Albany | City_Alhambra | City_Anaheim | City_Antioch | City_Aptos | City_Arcadia | City_Arcata | City_Bakersfield | City_Baldwin Park | City_Banning | City_Bella Vista | City_Belmont | City_Belvedere Tiburon | City_Ben Lomond | City_Berkeley | City_Beverly Hills | City_Bodega Bay | City_Bonita | City_Boulder Creek | City_Brea | City_Brisbane | City_Burlingame | City_Calabasas | City_Camarillo | City_Campbell | City_Canoga Park | City_Capistrano Beach | City_Capitola | City_Cardiff By The Sea | City_Carlsbad | City_Carpinteria | City_Carson | City_Castro Valley | City_Ceres | City_Chatsworth | City_Chico | City_Chino | City_Chino Hills | City_Chula Vista | City_Citrus Heights | City_Claremont | City_Clearlake | City_Clovis | City_Concord | City_Costa Mesa | City_Crestline | City_Culver City | City_Cupertino | City_Cypress | City_Daly City | City_Danville | City_Davis | City_Diamond Bar | City_Edwards | City_El Dorado Hills | City_El Segundo | City_El Sobrante | City_Elk Grove | City_Emeryville | City_Encinitas | City_Escondido | City_Eureka | City_Fairfield | City_Fallbrook | City_Fawnskin | City_Folsom | City_Fremont | City_Fresno | City_Fullerton | City_Garden Grove | City_Gilroy | City_Glendale | City_Glendora | City_Goleta | City_Greenbrae | City_Hacienda Heights | City_Half Moon Bay | City_Hawthorne | City_Hayward | City_Hermosa Beach | City_Highland | City_Hollister | City_Hopland | City_Huntington Beach | City_Imperial | City_Inglewood | City_Irvine | City_La Jolla | City_La Mesa | City_La Mirada | City_La Palma | City_Ladera Ranch | City_Laguna Hills | City_Laguna Niguel | City_Lake Forest | City_Larkspur | City_Livermore | City_Loma Linda | City_Lomita | City_Lompoc | City_Long Beach | City_Los Alamitos | City_Los Altos | City_Los Angeles | City_Los Gatos | City_Manhattan Beach | City_March Air Reserve Base | City_Marina | City_Martinez | City_Menlo Park | City_Merced | City_Milpitas | City_Mission Hills | City_Mission Viejo | City_Modesto | City_Monrovia | City_Montague | City_Montclair | City_Montebello | City_Monterey | City_Monterey Park | City_Moraga | City_Morgan Hill | City_Moss Landing | City_Mountain View | City_Napa | City_National City | City_Newbury Park | City_Newport Beach | City_North Hills | City_North Hollywood | City_Northridge | City_Norwalk | City_Novato | City_Oak View | City_Oakland | City_Oceanside | City_Ojai | City_Orange | City_Oxnard | City_Pacific Grove | City_Pacific Palisades | City_Palo Alto | City_Palos Verdes Peninsula | City_Pasadena | City_Placentia | City_Pleasant Hill | City_Pleasanton | City_Pomona | City_Porter Ranch | City_Portola Valley | City_Poway | City_Rancho Cordova | City_Rancho Cucamonga | City_Rancho Palos Verdes | City_Redding | City_Redlands | City_Redondo Beach | City_Redwood City | City_Reseda | City_Richmond | City_Ridgecrest | City_Rio Vista | City_Riverside | City_Rohnert Park | City_Rosemead | City_Roseville | City_Rudno nad Hronom | City_Sacramento | City_Salinas | City_San Anselmo | City_San Bernardino | City_San Bruno | City_San Clemente | City_San Diego | City_San Dimas | City_San Francisco | City_San Gabriel | City_San Jose | City_San Juan Bautista | City_San Juan Capistrano | City_San Leandro | City_San Luis Obispo | City_San Luis Rey | City_San Marcos | City_San Mateo | City_San Pablo | City_San Rafael | City_San Ramon | City_San Ysidro | City_Sanger | City_Santa Ana | City_Santa Barbara | City_Santa Clara | City_Santa Clarita | City_Santa Cruz | City_Santa Monica | City_Santa Rosa | City_Santa Ynez | City_Saratoga | City_Sausalito | City_Seal Beach | City_Seaside | City_Sherman Oaks | City_Sierra Madre | City_Signal Hill | City_Simi Valley | City_Sonora | City_South Gate | City_South Lake Tahoe | City_South Pasadena | City_South San Francisco | City_Stanford | City_Stinson Beach | City_Stockton | City_Studio City | City_Sunland | City_Sunnyvale | City_Sylmar | City_Tahoe City | City_Tehachapi | City_Thousand Oaks | City_Torrance | City_Trinity Center | City_Tustin | City_Ukiah | City_Upland | City_Valencia | City_Vallejo | City_Van Nuys | City_Venice | City_Ventura | City_Vista | City_Walnut Creek | City_Weed | City_West Covina | City_West Sacramento | City_Westlake Village | City_Whittier | City_Woodland Hills | City_Yorba Linda | City_Yucaipa | State_CA | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 25 | 1.0 | 49 | 4 | 1.6 | 0 | 0 | 1 | 0 | 0 | 0 | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True |
| 1 | 45 | 19.0 | 34 | 3 | 1.5 | 0 | 0 | 1 | 0 | 0 | 0 | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True |
| 2 | 39 | 15.0 | 11 | 1 | 1.0 | 0 | 0 | 0 | 0 | 0 | 0 | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True |
| 3 | 35 | 9.0 | 100 | 1 | 2.7 | 0 | 0 | 0 | 0 | 0 | 0 | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True |
| 4 | 35 | 8.0 | 45 | 4 | 1.0 | 0 | 0 | 0 | 0 | 0 | 1 | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True |
#checking the shape of the dataset
df_model.shape
(5000, 258)
#checking the data types
df_model.dtypes.value_counts()
| count | |
|---|---|
| bool | 247 |
| int64 | 9 |
| float64 | 2 |
Observation:
No boolean or object data types, dataset is ready to go
2. Splitting the data for better suitability¶
#creating train and test dataset
X = df_model.drop('Personal_Loan',axis=1)
y = df_model['Personal_Loan']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
#Checking the train and test dataset sizes for confirmation
print(f'X_train shape:{X_train.shape}')
print(f'X_test shape:{X_test.shape}')
X_train shape:(4000, 257) X_test shape:(1000, 257)
#checking the percentage of target variable classes in the train and test sets
print("Percentage of classes in training set:")
print(y_train.value_counts(normalize=True))
print("Percentage of classes in test set:")
print(y_test.value_counts(normalize=True))
Percentage of classes in training set: Personal_Loan 0 0.905 1 0.095 Name: proportion, dtype: float64 Percentage of classes in test set: Personal_Loan 0 0.9 1 0.1 Name: proportion, dtype: float64
Observation:
This has been observed in the univariate analysis above, the class 1 proportion is 9.5% of the data
Model Evaluation Criterion¶
Model can make wrong predictions as:
Predicting a customer will accept a loan but in reality the customer would not accept a loan. Impact - Loss of resources
Predicting a customer will not accept a loan but in reality the customer would have accepted a loan. Impact - Loss of opportunity
Which case is more important?
If we predict a customer who was going to accept a loan as a customer and at the end will not accept a loan - loss of opportunity (FN)
How to reduce this loss (False Negatives)?
recall should be maximized, the greater the recall, the higher the chances of minimizing the false negatives.
Finally, we will start by creating functions to calculate different metrics and confusion matrix for the models
## Function to calculate recall score
def get_recall_score(model, predictors, target):
"""
model: classifier
predictors: independent variables
target: dependent variable
"""
prediction = model.predict(predictors)
return recall_score(target, prediction)
## Function to calculate confusion matrix
def confusion_matrix_sklearn(model, predictors, target):
"""
To plot the confusion_matrix with percentages
model: classifier
predictors: independent variables
target: dependent variable
"""
y_pred = model.predict(predictors)
cm = confusion_matrix(target, y_pred)
labels = np.asarray(
[
["{0:0.0f}".format(item) + "\n{0:.2%}".format(item / cm.flatten().sum())]
for item in cm.flatten()
]
).reshape(2, 2)
plt.figure(figsize=(6, 4))
sns.heatmap(cm, annot=labels, fmt="")
plt.ylabel("True label")
plt.xlabel("Predicted label")
Model Building¶
1. Logistic Regression¶
Define the functions that supports the performance evaluation of the Logistic Regression Model. These are;
a. confusion_matrix_sklearn_with_threshold (Builds the confusion matrix of the classification model)
b. model_performance_classification_sklearn_with_threshold (Computes the different metrics)
c. plot_prec_recall_vs_tresh (plots precission and recall VS Threshold)
# defining a function to plot the confusion_matrix of a classification model built using sklearn
def confusion_matrix_sklearn_with_threshold(model, predictors, target, threshold=0.5):
"""
To plot the confusion_matrix, based on the threshold specified, with percentages
model: classifier
predictors: independent variables
target: dependent variable
threshold: threshold for classifying the observation as class 1
"""
pred_prob = model.predict_proba(predictors)[:, 1]
pred_thres = pred_prob > threshold
y_pred = np.round(pred_thres)
cm = confusion_matrix(target, y_pred)
labels = np.asarray(
[
["{0:0.0f}".format(item) + "\n{0:.2%}".format(item / cm.flatten().sum())]
for item in cm.flatten()
]
).reshape(2, 2)
plt.figure(figsize=(6, 4))
sns.heatmap(cm, annot=labels, fmt="")
plt.ylabel("True label")
plt.xlabel("Predicted label")
# defining a function to compute different metrics to check performance of a classification model built using sklearn
def model_performance_classification_sklearn_with_threshold(model, predictors, target, threshold=0.5):
"""
Function to compute different metrics, based on the threshold specified, to check classification model performance
model: classifier
predictors: independent variables
target: dependent variable
threshold: threshold for classifying the observation as class 1
"""
# predicting using the independent variables
pred_prob = model.predict_proba(predictors)[:, 1]
pred_thres = pred_prob > threshold
pred = np.round(pred_thres)
acc = accuracy_score(target, pred) # to compute Accuracy
recall = recall_score(target, pred) # to compute Recall
precision = precision_score(target, pred) # to compute Precision
f1 = f1_score(target, pred) # to compute F1-score
# creating a dataframe of metrics
df_perf = pd.DataFrame(
{
"Accuracy": acc,
"Recall": recall,
"Precision": precision,
"F1": f1,
},
index=[0],
)
return df_perf
# defining a function to compute thresholds
def plot_prec_recall_vs_tresh(precisions, recalls, thresholds):
plt.plot(thresholds, precisions[:-1], "b--", label="precision")
plt.plot(thresholds, recalls[:-1], "g--", label="recall")
plt.xlabel("Threshold")
plt.legend(loc="upper left")
plt.ylim([0, 1])
# There are different solvers available in Sklearn logistic regression
# The newton-cg solver is applied as it is faster for high-dimensional data
lg = LogisticRegression(solver="newton-cg", random_state=1)
model = lg.fit(X_train, y_train)
Locating the coefficients¶
# checking the coefficients and intercept of the model
coef_df = pd.DataFrame(np.append(lg.coef_, lg.intercept_),
index=X_train.columns.tolist() + ["Intercept"],
columns=["Coefficients"],
)
coef_df
| Coefficients | |
|---|---|
| Age | 0.073051 |
| Experience | -0.068624 |
| Income | 0.058481 |
| Family | 0.638125 |
| CCAvg | 0.161678 |
| ... | ... |
| City_Woodland Hills | 0.755017 |
| City_Yorba Linda | -0.036893 |
| City_Yucaipa | -0.006452 |
| State_CA | -0.006848 |
| Intercept | -14.691445 |
258 rows × 1 columns
Coefficient interpretations¶
#sorting by descending order to recognize the more powerful variables
coef_df.sort_values(by='Coefficients',ascending=False)
| Coefficients | |
|---|---|
| Education_3 | 3.710503 |
| Education_2 | 3.472068 |
| CD_Account | 3.139787 |
| City_Los Gatos | 1.214717 |
| City_Whittier | 0.915968 |
| ... | ... |
| CreditCard | -0.831228 |
| City_Carlsbad | -0.868982 |
| City_Tustin | -0.870770 |
| City_Livermore | -0.885393 |
| Intercept | -14.691445 |
258 rows × 1 columns
#The positive coefficients
coef_df[coef_df['Coefficients']>0].sort_values(by='Coefficients',ascending=False)
| Coefficients | |
|---|---|
| Education_3 | 3.710503 |
| Education_2 | 3.472068 |
| CD_Account | 3.139787 |
| City_Los Gatos | 1.214717 |
| City_Whittier | 0.915968 |
| City_Martinez | 0.894860 |
| City_Oak View | 0.888115 |
| City_Greenbrae | 0.858732 |
| City_Cardiff By The Sea | 0.819786 |
| City_Irvine | 0.790847 |
| City_West Sacramento | 0.785846 |
| City_Woodland Hills | 0.755017 |
| City_Richmond | 0.743801 |
| City_Banning | 0.733911 |
| City_Campbell | 0.730927 |
| City_San Ysidro | 0.720148 |
| City_Sunnyvale | 0.710214 |
| City_Venice | 0.709283 |
| City_Novato | 0.695282 |
| City_Bakersfield | 0.680077 |
| City_Torrance | 0.672859 |
| City_Glendale | 0.663426 |
| City_Moss Landing | 0.646149 |
| Family | 0.638125 |
| City_Seaside | 0.556037 |
| City_San Juan Capistrano | 0.522589 |
| City_Los Altos | 0.489764 |
| City_El Sobrante | 0.489080 |
| City_Reseda | 0.489041 |
| City_San Jose | 0.482935 |
| City_Fairfield | 0.473225 |
| City_Thousand Oaks | 0.461609 |
| City_Valencia | 0.460971 |
| City_Calabasas | 0.458693 |
| City_Ridgecrest | 0.458088 |
| City_Fawnskin | 0.426964 |
| City_Rohnert Park | 0.426482 |
| City_Beverly Hills | 0.418402 |
| City_Lomita | 0.405845 |
| City_Vallejo | 0.405464 |
| City_Riverside | 0.394553 |
| City_Santa Clarita | 0.366339 |
| City_Cypress | 0.348640 |
| City_Montebello | 0.340062 |
| City_San Clemente | 0.337455 |
| City_Placentia | 0.316495 |
| City_Chula Vista | 0.302450 |
| City_Roseville | 0.291509 |
| City_Laguna Niguel | 0.283122 |
| City_Santa Barbara | 0.264081 |
| City_Santa Cruz | 0.242712 |
| City_Hayward | 0.233567 |
| City_Pasadena | 0.216142 |
| City_Berkeley | 0.211861 |
| City_San Diego | 0.205123 |
| City_Stanford | 0.202574 |
| City_Fullerton | 0.173889 |
| CCAvg | 0.161678 |
| City_Fremont | 0.160599 |
| City_Sherman Oaks | 0.144131 |
| City_Huntington Beach | 0.137023 |
| City_Elk Grove | 0.123813 |
| City_Capitola | 0.105287 |
| City_San Luis Rey | 0.091535 |
| City_Monrovia | 0.091074 |
| City_Sacramento | 0.089503 |
| City_Oceanside | 0.076763 |
| City_San Francisco | 0.074668 |
| Age | 0.073051 |
| City_Walnut Creek | 0.063006 |
| City_San Bernardino | 0.062996 |
| City_Ventura | 0.062261 |
| Income | 0.058481 |
| City_Eureka | 0.053431 |
| City_La Jolla | 0.049407 |
| City_Costa Mesa | 0.046454 |
| City_Los Angeles | 0.046066 |
| City_Carpinteria | 0.037365 |
| City_Camarillo | 0.033949 |
| City_Norwalk | 0.024411 |
| City_Bella Vista | 0.010442 |
| City_Palo Alto | 0.002611 |
| Mortgage | 0.000971 |
| City_Pleasanton | 0.000951 |
Observation Positive coefficients:
The Above table shows all variables with Positive cefficients, meaning variables with value increase the propability of the customer to accept a personal loan increases.
The top 5 variables affecting the target variable are:
- Education_3
- Education_2
- CD_Account
- City_Los Gatos
- City_Whittier
#The Negative coefficients
coef_df[coef_df['Coefficients']<0].sort_values(by='Coefficients')
| Coefficients | |
|---|---|
| Intercept | -14.691445 |
| City_Livermore | -0.885393 |
| City_Tustin | -0.870770 |
| City_Carlsbad | -0.868982 |
| CreditCard | -0.831228 |
| City_Manhattan Beach | -0.812330 |
| City_Alhambra | -0.740918 |
| Securities_Account | -0.704059 |
| City_Davis | -0.673413 |
| City_Milpitas | -0.648076 |
| City_Loma Linda | -0.635195 |
| City_Menlo Park | -0.601949 |
| City_Redwood City | -0.574392 |
| City_Monterey | -0.571322 |
| Online | -0.559086 |
| City_Oakland | -0.555836 |
| City_Brisbane | -0.542601 |
| City_Fallbrook | -0.529842 |
| City_Redondo Beach | -0.521048 |
| City_Emeryville | -0.495187 |
| City_Northridge | -0.478665 |
| City_Arcata | -0.469662 |
| City_Santa Clara | -0.455029 |
| City_North Hollywood | -0.440298 |
| City_Studio City | -0.424727 |
| City_Diamond Bar | -0.421248 |
| City_Rancho Cordova | -0.411055 |
| City_Burlingame | -0.376264 |
| City_South San Francisco | -0.359612 |
| City_Mountain View | -0.353475 |
| City_Newbury Park | -0.348923 |
| City_Boulder Creek | -0.345930 |
| City_Santa Ana | -0.335801 |
| City_Fresno | -0.330912 |
| City_San Anselmo | -0.324996 |
| City_San Luis Obispo | -0.322205 |
| City_Mission Viejo | -0.321912 |
| City_Bonita | -0.321378 |
| City_Chatsworth | -0.308760 |
| City_Culver City | -0.299640 |
| City_San Marcos | -0.296642 |
| City_Laguna Hills | -0.279902 |
| City_Anaheim | -0.277668 |
| City_Redlands | -0.275373 |
| City_Goleta | -0.275021 |
| City_Merced | -0.273667 |
| City_Redding | -0.263811 |
| City_Orange | -0.247645 |
| City_Pomona | -0.236133 |
| City_La Mesa | -0.233303 |
| City_Santa Ynez | -0.230488 |
| City_Sanger | -0.227368 |
| City_Arcadia | -0.226941 |
| City_Canoga Park | -0.225762 |
| City_Hermosa Beach | -0.225225 |
| City_Marina | -0.224952 |
| City_Modesto | -0.221858 |
| City_La Palma | -0.219094 |
| City_Palos Verdes Peninsula | -0.203787 |
| City_Salinas | -0.203025 |
| City_Antioch | -0.189764 |
| City_National City | -0.186494 |
| City_San Leandro | -0.178120 |
| City_Capistrano Beach | -0.175551 |
| City_Alameda | -0.174646 |
| City_Sunland | -0.170529 |
| City_Chino Hills | -0.166438 |
| City_West Covina | -0.164322 |
| City_San Ramon | -0.152015 |
| City_Montclair | -0.149160 |
| City_San Juan Bautista | -0.147283 |
| City_San Mateo | -0.146367 |
| City_Ojai | -0.144069 |
| City_Rosemead | -0.131931 |
| City_Simi Valley | -0.130683 |
| City_Tehachapi | -0.129535 |
| City_Sonora | -0.123752 |
| City_Highland | -0.121139 |
| City_Seal Beach | -0.120118 |
| City_Lake Forest | -0.117873 |
| City_Monterey Park | -0.114071 |
| City_Van Nuys | -0.106638 |
| City_Sierra Madre | -0.106416 |
| City_Ukiah | -0.106321 |
| City_Poway | -0.103103 |
| City_Citrus Heights | -0.102328 |
| City_Rancho Cucamonga | -0.098986 |
| City_Hopland | -0.093481 |
| City_Newport Beach | -0.089097 |
| City_South Lake Tahoe | -0.087573 |
| City_Chico | -0.086325 |
| City_Los Alamitos | -0.084993 |
| City_Portola Valley | -0.083165 |
| City_Porter Ranch | -0.082931 |
| City_Santa Monica | -0.078165 |
| City_San Bruno | -0.075458 |
| City_Albany | -0.073917 |
| City_Baldwin Park | -0.073611 |
| City_Aptos | -0.073277 |
| City_El Dorado Hills | -0.072933 |
| City_La Mirada | -0.072853 |
| City_Glendora | -0.071343 |
| City_Larkspur | -0.070541 |
| Experience | -0.068624 |
| City_San Gabriel | -0.065439 |
| City_Hollister | -0.064162 |
| City_Trinity Center | -0.064063 |
| City_Bodega Bay | -0.063711 |
| City_Belvedere Tiburon | -0.061032 |
| City_March Air Reserve Base | -0.060479 |
| City_Rio Vista | -0.059178 |
| City_Concord | -0.057318 |
| City_Cupertino | -0.056513 |
| City_Westlake Village | -0.055226 |
| City_Gilroy | -0.053535 |
| City_Pleasant Hill | -0.052104 |
| City_Claremont | -0.044581 |
| City_Long Beach | -0.038953 |
| City_Yorba Linda | -0.036893 |
| City_Crestline | -0.036679 |
| City_South Gate | -0.036029 |
| City_Mission Hills | -0.035818 |
| City_Daly City | -0.031724 |
| City_Brea | -0.031526 |
| City_Inglewood | -0.031001 |
| City_Castro Valley | -0.028638 |
| City_Folsom | -0.026668 |
| City_Santa Rosa | -0.026281 |
| City_Carson | -0.025603 |
| City_Montague | -0.024715 |
| City_San Rafael | -0.022831 |
| City_Saratoga | -0.022165 |
| City_Clovis | -0.019849 |
| City_Danville | -0.019296 |
| City_Belmont | -0.018305 |
| City_Stockton | -0.017588 |
| City_Clearlake | -0.017547 |
| City_Weed | -0.017243 |
| City_Hawthorne | -0.015285 |
| City_Hacienda Heights | -0.012823 |
| City_El Segundo | -0.012718 |
| City_Moraga | -0.011544 |
| City_Imperial | -0.011336 |
| City_Rudno nad Hronom | -0.011115 |
| City_San Pablo | -0.010608 |
| City_San Dimas | -0.008629 |
| City_Sylmar | -0.008480 |
| City_Ben Lomond | -0.008379 |
| State_CA | -0.006848 |
| City_Signal Hill | -0.006497 |
| City_Yucaipa | -0.006452 |
| City_Escondido | -0.005825 |
| City_Morgan Hill | -0.005472 |
| City_Napa | -0.005363 |
| City_Vista | -0.004620 |
| City_Encinitas | -0.004001 |
| City_Upland | -0.002961 |
| City_North Hills | -0.002844 |
| City_Alamo | -0.002776 |
| City_Pacific Grove | -0.002188 |
| City_Edwards | -0.001852 |
| City_South Pasadena | -0.001376 |
| City_Garden Grove | -0.001229 |
| City_Rancho Palos Verdes | -0.000939 |
| City_Chino | -0.000763 |
| City_Lompoc | -0.000517 |
| City_Oxnard | -0.000512 |
| City_Pacific Palisades | -0.000317 |
| City_Sausalito | -0.000302 |
| City_Tahoe City | -0.000186 |
| City_Half Moon Bay | -0.000064 |
Observation Negative coefficients:
The above table shows all variables with Negative cefficients - i.e With value increase the propability of the customer to accept a personal loan decreases.
The top 5 variables affecting the target variable are:
- City_Livermore
- CreditCard
- Securities_Account
- City_Davis
- City_Carlsbad
Odds Coefficients interpretation
# converting coefficients to odds
odds = np.exp(lg.coef_[0])
# finding the percentage change
perc_change_odds = (np.exp(lg.coef_[0]) - 1) * 100
# removing limit from number of columns to display
pd.set_option("display.max_columns", None)
# adding the odds to a dataframe
pd.DataFrame({"Odds": odds, "Change_odd%": perc_change_odds}, index=X_train.columns).sort_values(by='Odds',ascending=False)
| Odds | Change_odd% | |
|---|---|---|
| Education_3 | 40.874366 | 3987.436574 |
| Education_2 | 32.203258 | 3120.325806 |
| CD_Account | 23.098949 | 2209.894917 |
| City_Los Gatos | 3.369340 | 236.934039 |
| City_Whittier | 2.499195 | 149.919451 |
| ... | ... | ... |
| City_Manhattan Beach | 0.443823 | -55.617718 |
| CreditCard | 0.435514 | -56.448568 |
| City_Carlsbad | 0.419378 | -58.062174 |
| City_Tustin | 0.418629 | -58.137092 |
| City_Livermore | 0.412552 | -58.744819 |
257 rows × 2 columns
Overall Observation
A. The top 5 variables affecting the target variable positively are:
- Education_3 : increases the customer odds of accepting a personal loan by 30 times
- Education_2 : increases the customer odds of accepting a personal loan by 26 times
- CD_Account : increases the customer odds of accepting a personal loan by 25 times
- City_Los Gatos : increases the customer odds of accepting a personal loan by 3.5 times
- City_Martinez : increases the customer odds of accepting a personal loan by 2.5 times
B. The top 5 variables affecting the target variable Negatively are:
- City_Livermore : decreases the customer odds of accepting a personal loan by 0.44 times
- CreditCard : decreases the customer odds of accepting a personal loan by 0.44 times
- Securities_Account : decreases the customer odds of accepting a personal loan by 0.42 times
- City_Davis : decreases the customer odds of accepting a personal loan by 0.44 times
- City_Carlsbad : decreases the customer odds of accepting a personal loan by 0.44 times
Model Performance Evaluation - Logistic Regression¶
Checking model performance on training set, threshold = 0.5
# creating confusion matrix
confusion_matrix_sklearn_with_threshold(lg, X_train, y_train)
Checking performance on test set
log_reg_model_train_perf = model_performance_classification_sklearn_with_threshold(
lg, X_train, y_train
)
print("Training Set performance:")
log_reg_model_train_perf
Training Set performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.96325 | 0.689474 | 0.900344 | 0.780924 |
# creating confusion matrix
confusion_matrix_sklearn_with_threshold(lg, X_test, y_test)
log_reg_model_test_perf = model_performance_classification_sklearn_with_threshold(
lg, X_test, y_test
)
print("Test set performance:")
log_reg_model_test_perf
Test set performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.954 | 0.64 | 0.864865 | 0.735632 |
Observation:
The Recall on the training and test sets are close in value 62% and 68% respectively, yet the model performance needs to be improved as viewed from the confusion matrix as well:
Test Performance of 6.20% which is less than than 9% positive class of the dataset
Model Performance Improvement¶
I will like to use 0.10 threshold is to Checking model performance on training and test data tests
# creating confusion matrix for train set
confusion_matrix_sklearn_with_threshold(
lg, X_train, y_train, threshold=0.10
)
# checking model performance for this model
log_reg_model_train_perf_threshold_curve = model_performance_classification_sklearn_with_threshold(
lg, X_train, y_train, threshold=0.10
)
print("Training performance with deduced threshhold from Precision - Recall curve is:")
log_reg_model_train_perf_threshold_curve
Training performance with deduced threshhold from Precision - Recall curve is:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.9045 | 0.913158 | 0.498563 | 0.644981 |
Checking model performance on test set
# creating confusion matrix
confusion_matrix_sklearn_with_threshold(
lg, X_test, y_test, threshold=0.10
)
# checking model performance for this model
log_reg_model_test_perf_threshold_curve = model_performance_classification_sklearn_with_threshold(
lg, X_test, y_test, threshold=0.10
)
print("Test performance with deduced threshhold from the Precision - Recall curve is::")
log_reg_model_test_perf_threshold_curve
Test performance with deduced threshhold from the Precision - Recall curve is::
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.902 | 0.87 | 0.505814 | 0.639706 |
Observation and conclusion:
At threshold 0.10 the Recall for the test and train data sets is equal to 87% and 91% respectively which looks good performance by the logistics regression model to minimize the FN on our model to only 1.20% on the test set while still maintaining a precision value of approximately 50% for both test and train datasets.
Let us apply sequential feature selector¶
from mlxtend.feature_selection import SequentialFeatureSelector as SFS
# to plot the performance with addition of each feature
from mlxtend.plotting import plot_sequential_feature_selection as plot_sfs
# from sklearn.linear_model import LogisticRegression
# Fit the model on train
model = LogisticRegression(solver="newton-cg", n_jobs=-1, random_state=1, max_iter=100)
X_train.shape
(4000, 257)
# we will first build model with all varaible
sfs = SFS(
model,
k_features=257,
forward=True,
floating=False,
scoring="f1",
verbose=2,
cv=2,
n_jobs=-1,
)
sfs = sfs.fit(X_train, y_train)
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 2.1s [Parallel(n_jobs=-1)]: Done 146 tasks | elapsed: 4.4s [Parallel(n_jobs=-1)]: Done 257 out of 257 | elapsed: 6.7s finished [2025-11-17 04:27:24] Features: 1/257 -- score: 0.4104808235243018[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.8s [Parallel(n_jobs=-1)]: Done 256 out of 256 | elapsed: 4.0s finished [2025-11-17 04:27:28] Features: 2/257 -- score: 0.5389730931641541[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 240 out of 255 | elapsed: 3.7s remaining: 0.2s [Parallel(n_jobs=-1)]: Done 255 out of 255 | elapsed: 4.0s finished [2025-11-17 04:27:32] Features: 3/257 -- score: 0.5878732445359096[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.8s [Parallel(n_jobs=-1)]: Done 254 out of 254 | elapsed: 4.0s finished [2025-11-17 04:27:36] Features: 4/257 -- score: 0.7075021174604891[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.8s [Parallel(n_jobs=-1)]: Done 238 out of 253 | elapsed: 4.0s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 253 out of 253 | elapsed: 4.3s finished [2025-11-17 04:27:40] Features: 5/257 -- score: 0.7233252563594652[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.8s [Parallel(n_jobs=-1)]: Done 237 out of 252 | elapsed: 4.0s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 252 out of 252 | elapsed: 4.2s finished [2025-11-17 04:27:45] Features: 6/257 -- score: 0.7376643013945693[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 236 out of 251 | elapsed: 4.0s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 251 out of 251 | elapsed: 4.2s finished [2025-11-17 04:27:49] Features: 7/257 -- score: 0.7416059055296409[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.8s [Parallel(n_jobs=-1)]: Done 250 out of 250 | elapsed: 4.4s finished [2025-11-17 04:27:53] Features: 8/257 -- score: 0.7453903774128494[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.8s [Parallel(n_jobs=-1)]: Done 234 out of 249 | elapsed: 4.0s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 249 out of 249 | elapsed: 4.2s finished [2025-11-17 04:27:57] Features: 9/257 -- score: 0.7481993661768942[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.8s [Parallel(n_jobs=-1)]: Done 233 out of 248 | elapsed: 4.0s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 248 out of 248 | elapsed: 4.2s finished [2025-11-17 04:28:01] Features: 10/257 -- score: 0.7492596605272661[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.8s [Parallel(n_jobs=-1)]: Done 232 out of 247 | elapsed: 4.3s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 247 out of 247 | elapsed: 4.4s finished [2025-11-17 04:28:06] Features: 11/257 -- score: 0.7512667056653017[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.8s [Parallel(n_jobs=-1)]: Done 246 out of 246 | elapsed: 4.2s finished [2025-11-17 04:28:10] Features: 12/257 -- score: 0.7530154000789748[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.8s [Parallel(n_jobs=-1)]: Done 230 out of 245 | elapsed: 4.1s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 245 out of 245 | elapsed: 4.2s finished [2025-11-17 04:28:14] Features: 13/257 -- score: 0.7547542978852884[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 244 out of 244 | elapsed: 4.5s finished [2025-11-17 04:28:19] Features: 14/257 -- score: 0.75674855929633[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 228 out of 243 | elapsed: 4.1s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 243 out of 243 | elapsed: 4.3s finished [2025-11-17 04:28:23] Features: 15/257 -- score: 0.757411777229743[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 242 out of 242 | elapsed: 4.3s finished [2025-11-17 04:28:27] Features: 16/257 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 226 out of 241 | elapsed: 4.2s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 241 out of 241 | elapsed: 4.4s finished [2025-11-17 04:28:32] Features: 17/257 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.8s [Parallel(n_jobs=-1)]: Done 240 out of 240 | elapsed: 4.3s finished [2025-11-17 04:28:36] Features: 18/257 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 224 out of 239 | elapsed: 4.4s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 239 out of 239 | elapsed: 4.7s finished [2025-11-17 04:28:41] Features: 19/257 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 223 out of 238 | elapsed: 4.1s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 238 out of 238 | elapsed: 4.3s finished [2025-11-17 04:28:45] Features: 20/257 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 222 out of 237 | elapsed: 4.2s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 237 out of 237 | elapsed: 4.5s finished [2025-11-17 04:28:50] Features: 21/257 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 221 out of 236 | elapsed: 4.4s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 236 out of 236 | elapsed: 4.7s finished [2025-11-17 04:28:54] Features: 22/257 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 235 out of 235 | elapsed: 4.5s finished [2025-11-17 04:28:59] Features: 23/257 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 219 out of 234 | elapsed: 4.1s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 234 out of 234 | elapsed: 4.4s finished [2025-11-17 04:29:03] Features: 24/257 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 218 out of 233 | elapsed: 4.4s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 233 out of 233 | elapsed: 4.6s finished [2025-11-17 04:29:08] Features: 25/257 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 232 out of 232 | elapsed: 4.4s finished [2025-11-17 04:29:12] Features: 26/257 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 216 out of 231 | elapsed: 4.4s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 231 out of 231 | elapsed: 4.7s finished [2025-11-17 04:29:17] Features: 27/257 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 230 out of 230 | elapsed: 4.3s finished [2025-11-17 04:29:21] Features: 28/257 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 214 out of 229 | elapsed: 4.2s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 229 out of 229 | elapsed: 4.4s finished [2025-11-17 04:29:26] Features: 29/257 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 228 out of 228 | elapsed: 4.6s finished [2025-11-17 04:29:30] Features: 30/257 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 212 out of 227 | elapsed: 4.2s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 227 out of 227 | elapsed: 4.4s finished [2025-11-17 04:29:35] Features: 31/257 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 226 out of 226 | elapsed: 4.3s finished [2025-11-17 04:29:39] Features: 32/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 210 out of 225 | elapsed: 4.4s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 225 out of 225 | elapsed: 4.6s finished [2025-11-17 04:29:44] Features: 33/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 224 out of 224 | elapsed: 4.4s finished [2025-11-17 04:29:48] Features: 34/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 208 out of 223 | elapsed: 4.3s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 223 out of 223 | elapsed: 4.7s finished [2025-11-17 04:29:53] Features: 35/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 222 out of 222 | elapsed: 4.4s finished [2025-11-17 04:29:57] Features: 36/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 206 out of 221 | elapsed: 4.2s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 221 out of 221 | elapsed: 4.4s finished [2025-11-17 04:30:02] Features: 37/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 220 out of 220 | elapsed: 4.7s finished [2025-11-17 04:30:06] Features: 38/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 204 out of 219 | elapsed: 4.3s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 219 out of 219 | elapsed: 4.5s finished [2025-11-17 04:30:11] Features: 39/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 203 out of 218 | elapsed: 4.1s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 218 out of 218 | elapsed: 4.4s finished [2025-11-17 04:30:15] Features: 40/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 217 out of 217 | elapsed: 4.6s finished [2025-11-17 04:30:20] Features: 41/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 216 out of 216 | elapsed: 4.5s finished [2025-11-17 04:30:24] Features: 42/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 200 out of 215 | elapsed: 4.4s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 215 out of 215 | elapsed: 4.7s finished [2025-11-17 04:30:29] Features: 43/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 214 out of 214 | elapsed: 4.4s finished [2025-11-17 04:30:34] Features: 44/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 198 out of 213 | elapsed: 4.3s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 213 out of 213 | elapsed: 4.5s finished [2025-11-17 04:30:38] Features: 45/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 212 out of 212 | elapsed: 4.7s finished [2025-11-17 04:30:43] Features: 46/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 196 out of 211 | elapsed: 4.3s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 211 out of 211 | elapsed: 4.5s finished [2025-11-17 04:30:47] Features: 47/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 210 out of 210 | elapsed: 4.5s finished [2025-11-17 04:30:52] Features: 48/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 194 out of 209 | elapsed: 4.4s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 209 out of 209 | elapsed: 4.6s finished [2025-11-17 04:30:57] Features: 49/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 208 out of 208 | elapsed: 4.5s finished [2025-11-17 04:31:01] Features: 50/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 192 out of 207 | elapsed: 4.6s remaining: 0.4s [Parallel(n_jobs=-1)]: Done 207 out of 207 | elapsed: 4.9s finished [2025-11-17 04:31:06] Features: 51/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 191 out of 206 | elapsed: 4.3s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 206 out of 206 | elapsed: 4.5s finished [2025-11-17 04:31:11] Features: 52/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 190 out of 205 | elapsed: 4.3s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 205 out of 205 | elapsed: 4.6s finished [2025-11-17 04:31:15] Features: 53/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 204 out of 204 | elapsed: 4.9s finished [2025-11-17 04:31:20] Features: 54/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 203 out of 203 | elapsed: 4.7s finished [2025-11-17 04:31:25] Features: 55/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 187 out of 202 | elapsed: 4.4s remaining: 0.4s [Parallel(n_jobs=-1)]: Done 202 out of 202 | elapsed: 4.7s finished [2025-11-17 04:31:29] Features: 56/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 201 out of 201 | elapsed: 4.6s finished [2025-11-17 04:31:34] Features: 57/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 200 out of 200 | elapsed: 4.7s finished [2025-11-17 04:31:39] Features: 58/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 184 out of 199 | elapsed: 4.7s remaining: 0.4s [Parallel(n_jobs=-1)]: Done 199 out of 199 | elapsed: 5.0s finished [2025-11-17 04:31:44] Features: 59/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 198 out of 198 | elapsed: 4.6s finished [2025-11-17 04:31:49] Features: 60/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 182 out of 197 | elapsed: 4.6s remaining: 0.4s [Parallel(n_jobs=-1)]: Done 197 out of 197 | elapsed: 4.9s finished [2025-11-17 04:31:53] Features: 61/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 196 out of 196 | elapsed: 4.7s finished [2025-11-17 04:31:58] Features: 62/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 180 out of 195 | elapsed: 4.5s remaining: 0.4s [Parallel(n_jobs=-1)]: Done 195 out of 195 | elapsed: 4.7s finished [2025-11-17 04:32:03] Features: 63/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 194 out of 194 | elapsed: 4.9s finished [2025-11-17 04:32:08] Features: 64/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 178 out of 193 | elapsed: 4.5s remaining: 0.4s [Parallel(n_jobs=-1)]: Done 193 out of 193 | elapsed: 4.7s finished [2025-11-17 04:32:13] Features: 65/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 192 out of 192 | elapsed: 4.9s finished [2025-11-17 04:32:17] Features: 66/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 191 out of 191 | elapsed: 4.9s finished [2025-11-17 04:32:22] Features: 67/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 175 out of 190 | elapsed: 4.4s remaining: 0.4s [Parallel(n_jobs=-1)]: Done 190 out of 190 | elapsed: 4.8s finished [2025-11-17 04:32:27] Features: 68/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 174 out of 189 | elapsed: 4.7s remaining: 0.4s [Parallel(n_jobs=-1)]: Done 189 out of 189 | elapsed: 5.1s finished [2025-11-17 04:32:32] Features: 69/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 188 out of 188 | elapsed: 4.8s finished [2025-11-17 04:32:37] Features: 70/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 187 out of 187 | elapsed: 5.1s finished [2025-11-17 04:32:42] Features: 71/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 171 out of 186 | elapsed: 4.5s remaining: 0.4s [Parallel(n_jobs=-1)]: Done 186 out of 186 | elapsed: 4.8s finished [2025-11-17 04:32:47] Features: 72/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 170 out of 185 | elapsed: 4.6s remaining: 0.4s [Parallel(n_jobs=-1)]: Done 185 out of 185 | elapsed: 4.8s finished [2025-11-17 04:32:52] Features: 73/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 169 out of 184 | elapsed: 4.8s remaining: 0.4s [Parallel(n_jobs=-1)]: Done 184 out of 184 | elapsed: 5.1s finished [2025-11-17 04:32:57] Features: 74/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 146 tasks | elapsed: 4.7s [Parallel(n_jobs=-1)]: Done 183 out of 183 | elapsed: 5.8s finished [2025-11-17 04:33:03] Features: 75/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 146 tasks | elapsed: 4.9s [Parallel(n_jobs=-1)]: Done 182 out of 182 | elapsed: 6.0s finished [2025-11-17 04:33:09] Features: 76/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 166 out of 181 | elapsed: 4.6s remaining: 0.4s [Parallel(n_jobs=-1)]: Done 181 out of 181 | elapsed: 4.8s finished [2025-11-17 04:33:14] Features: 77/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 146 tasks | elapsed: 4.9s [Parallel(n_jobs=-1)]: Done 180 out of 180 | elapsed: 5.9s finished [2025-11-17 04:33:19] Features: 78/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 146 tasks | elapsed: 4.8s [Parallel(n_jobs=-1)]: Done 179 out of 179 | elapsed: 5.7s finished [2025-11-17 04:33:25] Features: 79/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 178 out of 178 | elapsed: 4.9s finished [2025-11-17 04:33:30] Features: 80/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 146 tasks | elapsed: 4.8s [Parallel(n_jobs=-1)]: Done 177 out of 177 | elapsed: 5.8s finished [2025-11-17 04:33:36] Features: 81/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 176 out of 176 | elapsed: 4.8s finished [2025-11-17 04:33:41] Features: 82/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 146 tasks | elapsed: 5.0s [Parallel(n_jobs=-1)]: Done 175 out of 175 | elapsed: 5.8s finished [2025-11-17 04:33:47] Features: 83/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 146 tasks | elapsed: 4.7s [Parallel(n_jobs=-1)]: Done 174 out of 174 | elapsed: 5.6s finished [2025-11-17 04:33:52] Features: 84/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 146 tasks | elapsed: 5.0s [Parallel(n_jobs=-1)]: Done 173 out of 173 | elapsed: 5.8s finished [2025-11-17 04:33:58] Features: 85/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 157 out of 172 | elapsed: 4.5s remaining: 0.4s [Parallel(n_jobs=-1)]: Done 172 out of 172 | elapsed: 4.8s finished [2025-11-17 04:34:03] Features: 86/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 146 tasks | elapsed: 5.1s [Parallel(n_jobs=-1)]: Done 171 out of 171 | elapsed: 5.8s finished [2025-11-17 04:34:09] Features: 87/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 170 out of 170 | elapsed: 4.8s finished [2025-11-17 04:34:13] Features: 88/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 146 tasks | elapsed: 5.0s [Parallel(n_jobs=-1)]: Done 169 out of 169 | elapsed: 5.8s finished [2025-11-17 04:34:19] Features: 89/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 153 out of 168 | elapsed: 4.5s remaining: 0.4s [Parallel(n_jobs=-1)]: Done 168 out of 168 | elapsed: 4.8s finished [2025-11-17 04:34:24] Features: 90/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 146 tasks | elapsed: 5.0s [Parallel(n_jobs=-1)]: Done 167 out of 167 | elapsed: 5.7s finished [2025-11-17 04:34:30] Features: 91/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 146 tasks | elapsed: 4.9s [Parallel(n_jobs=-1)]: Done 166 out of 166 | elapsed: 5.5s finished [2025-11-17 04:34:35] Features: 92/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 150 out of 165 | elapsed: 4.5s remaining: 0.5s [Parallel(n_jobs=-1)]: Done 165 out of 165 | elapsed: 4.8s finished [2025-11-17 04:34:40] Features: 93/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 146 tasks | elapsed: 5.2s [Parallel(n_jobs=-1)]: Done 164 out of 164 | elapsed: 5.7s finished [2025-11-17 04:34:46] Features: 94/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 146 tasks | elapsed: 5.0s [Parallel(n_jobs=-1)]: Done 163 out of 163 | elapsed: 5.5s finished [2025-11-17 04:34:51] Features: 95/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.4s [Parallel(n_jobs=-1)]: Done 162 out of 162 | elapsed: 5.0s finished [2025-11-17 04:34:56] Features: 96/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 146 tasks | elapsed: 5.1s [Parallel(n_jobs=-1)]: Done 161 out of 161 | elapsed: 5.5s finished [2025-11-17 04:35:02] Features: 97/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 160 out of 160 | elapsed: 5.7s finished [2025-11-17 04:35:07] Features: 98/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 159 out of 159 | elapsed: 5.5s finished [2025-11-17 04:35:13] Features: 99/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 158 out of 158 | elapsed: 5.4s finished [2025-11-17 04:35:18] Features: 100/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 157 out of 157 | elapsed: 5.5s finished [2025-11-17 04:35:24] Features: 101/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 156 out of 156 | elapsed: 5.4s finished [2025-11-17 04:35:29] Features: 102/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 155 out of 155 | elapsed: 5.6s finished [2025-11-17 04:35:35] Features: 103/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 154 out of 154 | elapsed: 5.3s finished [2025-11-17 04:35:40] Features: 104/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 153 out of 153 | elapsed: 5.6s finished [2025-11-17 04:35:46] Features: 105/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 152 out of 152 | elapsed: 5.3s finished [2025-11-17 04:35:51] Features: 106/257 -- score: 0.7581978773668065[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 151 out of 151 | elapsed: 5.6s finished [2025-11-17 04:35:57] Features: 107/257 -- score: 0.7581978773668065[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 5.2s finished [2025-11-17 04:36:02] Features: 108/257 -- score: 0.7581978773668065[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 149 out of 149 | elapsed: 5.4s finished [2025-11-17 04:36:07] Features: 109/257 -- score: 0.7581978773668065[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 148 out of 148 | elapsed: 5.3s finished [2025-11-17 04:36:13] Features: 110/257 -- score: 0.7581978773668065[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 147 out of 147 | elapsed: 5.3s finished [2025-11-17 04:36:18] Features: 111/257 -- score: 0.7581978773668065[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 146 out of 146 | elapsed: 5.4s finished [2025-11-17 04:36:23] Features: 112/257 -- score: 0.7581978773668065[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 145 out of 145 | elapsed: 5.2s finished [2025-11-17 04:36:29] Features: 113/257 -- score: 0.7581978773668065[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 144 out of 144 | elapsed: 5.4s finished [2025-11-17 04:36:34] Features: 114/257 -- score: 0.7581978773668065[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 143 out of 143 | elapsed: 5.2s finished [2025-11-17 04:36:39] Features: 115/257 -- score: 0.7581978773668065[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 142 out of 142 | elapsed: 5.2s finished [2025-11-17 04:36:44] Features: 116/257 -- score: 0.7581978773668065[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 141 out of 141 | elapsed: 5.0s finished [2025-11-17 04:36:49] Features: 117/257 -- score: 0.7581978773668065[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 140 out of 140 | elapsed: 5.0s finished [2025-11-17 04:36:54] Features: 118/257 -- score: 0.7581978773668065[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 139 out of 139 | elapsed: 5.2s finished [2025-11-17 04:37:00] Features: 119/257 -- score: 0.7581978773668065[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 138 out of 138 | elapsed: 5.0s finished [2025-11-17 04:37:05] Features: 120/257 -- score: 0.7581978773668065[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 137 out of 137 | elapsed: 5.2s finished [2025-11-17 04:37:10] Features: 121/257 -- score: 0.7581978773668065[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.5s [Parallel(n_jobs=-1)]: Done 136 out of 136 | elapsed: 4.4s finished [2025-11-17 04:37:14] Features: 122/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 135 out of 135 | elapsed: 5.1s finished [2025-11-17 04:37:19] Features: 123/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 134 out of 134 | elapsed: 5.0s finished [2025-11-17 04:37:24] Features: 124/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 133 out of 133 | elapsed: 4.9s finished [2025-11-17 04:37:29] Features: 125/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 132 out of 132 | elapsed: 5.1s finished [2025-11-17 04:37:34] Features: 126/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 131 out of 131 | elapsed: 4.9s finished [2025-11-17 04:37:39] Features: 127/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 130 out of 130 | elapsed: 5.1s finished [2025-11-17 04:37:44] Features: 128/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 129 out of 129 | elapsed: 4.9s finished [2025-11-17 04:37:49] Features: 129/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 128 out of 128 | elapsed: 4.9s finished [2025-11-17 04:37:54] Features: 130/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 127 out of 127 | elapsed: 5.1s finished [2025-11-17 04:37:59] Features: 131/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 126 out of 126 | elapsed: 4.8s finished [2025-11-17 04:38:04] Features: 132/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 125 out of 125 | elapsed: 5.0s finished [2025-11-17 04:38:09] Features: 133/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 124 out of 124 | elapsed: 4.7s finished [2025-11-17 04:38:14] Features: 134/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 123 out of 123 | elapsed: 4.7s finished [2025-11-17 04:38:18] Features: 135/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 122 out of 122 | elapsed: 4.9s finished [2025-11-17 04:38:23] Features: 136/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 121 out of 121 | elapsed: 4.7s finished [2025-11-17 04:38:28] Features: 137/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 120 out of 120 | elapsed: 4.9s finished [2025-11-17 04:38:33] Features: 138/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 119 out of 119 | elapsed: 4.8s finished [2025-11-17 04:38:38] Features: 139/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 118 out of 118 | elapsed: 4.7s finished [2025-11-17 04:38:42] Features: 140/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 117 out of 117 | elapsed: 4.9s finished [2025-11-17 04:38:47] Features: 141/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 116 out of 116 | elapsed: 4.6s finished [2025-11-17 04:38:52] Features: 142/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 115 out of 115 | elapsed: 4.8s finished [2025-11-17 04:38:57] Features: 143/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 114 out of 114 | elapsed: 4.7s finished [2025-11-17 04:39:01] Features: 144/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 113 out of 113 | elapsed: 4.6s finished [2025-11-17 04:39:06] Features: 145/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 112 out of 112 | elapsed: 4.8s finished [2025-11-17 04:39:11] Features: 146/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 111 out of 111 | elapsed: 4.6s finished [2025-11-17 04:39:15] Features: 147/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 110 out of 110 | elapsed: 4.4s finished [2025-11-17 04:39:20] Features: 148/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 109 out of 109 | elapsed: 4.8s finished [2025-11-17 04:39:25] Features: 149/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 108 out of 108 | elapsed: 4.5s finished [2025-11-17 04:39:29] Features: 150/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 107 out of 107 | elapsed: 4.6s finished [2025-11-17 04:39:34] Features: 151/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 106 out of 106 | elapsed: 4.4s finished [2025-11-17 04:39:38] Features: 152/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 105 out of 105 | elapsed: 4.4s finished [2025-11-17 04:39:42] Features: 153/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 104 out of 104 | elapsed: 4.6s finished [2025-11-17 04:39:47] Features: 154/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 103 out of 103 | elapsed: 4.4s finished [2025-11-17 04:39:51] Features: 155/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 102 out of 102 | elapsed: 4.2s finished [2025-11-17 04:39:56] Features: 156/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 101 out of 101 | elapsed: 4.5s finished [2025-11-17 04:40:00] Features: 157/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 100 out of 100 | elapsed: 4.2s finished [2025-11-17 04:40:04] Features: 158/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 99 out of 99 | elapsed: 4.2s finished [2025-11-17 04:40:09] Features: 159/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.4s [Parallel(n_jobs=-1)]: Done 98 out of 98 | elapsed: 4.3s finished [2025-11-17 04:40:13] Features: 160/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 97 out of 97 | elapsed: 4.2s finished [2025-11-17 04:40:17] Features: 161/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 96 out of 96 | elapsed: 4.2s finished [2025-11-17 04:40:21] Features: 162/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.4s [Parallel(n_jobs=-1)]: Done 95 out of 95 | elapsed: 4.3s finished [2025-11-17 04:40:26] Features: 163/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 94 out of 94 | elapsed: 4.0s finished [2025-11-17 04:40:30] Features: 164/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 93 out of 93 | elapsed: 4.2s finished [2025-11-17 04:40:34] Features: 165/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.4s [Parallel(n_jobs=-1)]: Done 92 out of 92 | elapsed: 4.1s finished [2025-11-17 04:40:38] Features: 166/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 91 out of 91 | elapsed: 4.1s finished [2025-11-17 04:40:42] Features: 167/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 90 out of 90 | elapsed: 4.0s finished [2025-11-17 04:40:46] Features: 168/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 89 out of 89 | elapsed: 4.0s finished [2025-11-17 04:40:50] Features: 169/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 88 out of 88 | elapsed: 4.0s finished [2025-11-17 04:40:54] Features: 170/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 87 out of 87 | elapsed: 4.1s finished [2025-11-17 04:40:58] Features: 171/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 86 out of 86 | elapsed: 3.9s finished [2025-11-17 04:41:02] Features: 172/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 85 out of 85 | elapsed: 3.9s finished [2025-11-17 04:41:06] Features: 173/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 84 out of 84 | elapsed: 3.9s finished [2025-11-17 04:41:10] Features: 174/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.4s [Parallel(n_jobs=-1)]: Done 83 out of 83 | elapsed: 3.9s finished [2025-11-17 04:41:14] Features: 175/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 82 out of 82 | elapsed: 3.7s finished [2025-11-17 04:41:17] Features: 176/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 81 out of 81 | elapsed: 3.8s finished [2025-11-17 04:41:21] Features: 177/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.5s [Parallel(n_jobs=-1)]: Done 80 out of 80 | elapsed: 3.9s finished [2025-11-17 04:41:25] Features: 178/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.4s [Parallel(n_jobs=-1)]: Done 79 out of 79 | elapsed: 3.6s finished [2025-11-17 04:41:29] Features: 179/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 78 out of 78 | elapsed: 3.6s finished [2025-11-17 04:41:32] Features: 180/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.5s [Parallel(n_jobs=-1)]: Done 77 out of 77 | elapsed: 3.9s finished [2025-11-17 04:41:36] Features: 181/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 76 out of 76 | elapsed: 3.6s finished [2025-11-17 04:41:40] Features: 182/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.4s [Parallel(n_jobs=-1)]: Done 75 out of 75 | elapsed: 3.5s finished [2025-11-17 04:41:43] Features: 183/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 74 out of 74 | elapsed: 3.6s finished [2025-11-17 04:41:47] Features: 184/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.4s [Parallel(n_jobs=-1)]: Done 73 out of 73 | elapsed: 3.5s finished [2025-11-17 04:41:51] Features: 185/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.4s [Parallel(n_jobs=-1)]: Done 72 out of 72 | elapsed: 3.4s finished [2025-11-17 04:41:54] Features: 186/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.4s [Parallel(n_jobs=-1)]: Done 71 out of 71 | elapsed: 3.5s finished [2025-11-17 04:41:57] Features: 187/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.5s [Parallel(n_jobs=-1)]: Done 70 out of 70 | elapsed: 3.6s finished [2025-11-17 04:42:01] Features: 188/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.4s [Parallel(n_jobs=-1)]: Done 69 out of 69 | elapsed: 3.4s finished [2025-11-17 04:42:04] Features: 189/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.5s [Parallel(n_jobs=-1)]: Done 68 out of 68 | elapsed: 3.4s finished [2025-11-17 04:42:08] Features: 190/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.5s [Parallel(n_jobs=-1)]: Done 67 out of 67 | elapsed: 3.6s finished [2025-11-17 04:42:11] Features: 191/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.4s [Parallel(n_jobs=-1)]: Done 66 out of 66 | elapsed: 3.3s finished [2025-11-17 04:42:15] Features: 192/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.4s [Parallel(n_jobs=-1)]: Done 65 out of 65 | elapsed: 3.3s finished [2025-11-17 04:42:18] Features: 193/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.4s [Parallel(n_jobs=-1)]: Done 64 out of 64 | elapsed: 3.1s finished [2025-11-17 04:42:21] Features: 194/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.6s [Parallel(n_jobs=-1)]: Done 63 out of 63 | elapsed: 3.5s finished [2025-11-17 04:42:25] Features: 195/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.4s [Parallel(n_jobs=-1)]: Done 62 out of 62 | elapsed: 3.0s finished [2025-11-17 04:42:28] Features: 196/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.5s [Parallel(n_jobs=-1)]: Done 61 out of 61 | elapsed: 3.0s finished [2025-11-17 04:42:31] Features: 197/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.5s [Parallel(n_jobs=-1)]: Done 60 out of 60 | elapsed: 3.1s finished [2025-11-17 04:42:34] Features: 198/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.5s [Parallel(n_jobs=-1)]: Done 59 out of 59 | elapsed: 3.2s finished [2025-11-17 04:42:37] Features: 199/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.4s [Parallel(n_jobs=-1)]: Done 58 out of 58 | elapsed: 2.9s finished [2025-11-17 04:42:40] Features: 200/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.4s [Parallel(n_jobs=-1)]: Done 57 out of 57 | elapsed: 2.9s finished [2025-11-17 04:42:43] Features: 201/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.4s [Parallel(n_jobs=-1)]: Done 56 out of 56 | elapsed: 2.8s finished [2025-11-17 04:42:46] Features: 202/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.6s [Parallel(n_jobs=-1)]: Done 55 out of 55 | elapsed: 3.1s finished [2025-11-17 04:42:49] Features: 203/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.5s [Parallel(n_jobs=-1)]: Done 54 out of 54 | elapsed: 2.8s finished [2025-11-17 04:42:51] Features: 204/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.5s [Parallel(n_jobs=-1)]: Done 53 out of 53 | elapsed: 2.7s finished [2025-11-17 04:42:54] Features: 205/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.5s [Parallel(n_jobs=-1)]: Done 52 out of 52 | elapsed: 2.8s finished [2025-11-17 04:42:57] Features: 206/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.5s [Parallel(n_jobs=-1)]: Done 51 out of 51 | elapsed: 2.9s finished [2025-11-17 04:43:00] Features: 207/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.7s [Parallel(n_jobs=-1)]: Done 50 out of 50 | elapsed: 2.9s finished [2025-11-17 04:43:03] Features: 208/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.5s [Parallel(n_jobs=-1)]: Done 49 out of 49 | elapsed: 2.7s finished [2025-11-17 04:43:05] Features: 209/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.6s [Parallel(n_jobs=-1)]: Done 48 out of 48 | elapsed: 2.7s finished [2025-11-17 04:43:08] Features: 210/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.5s [Parallel(n_jobs=-1)]: Done 47 out of 47 | elapsed: 2.6s finished [2025-11-17 04:43:11] Features: 211/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.7s [Parallel(n_jobs=-1)]: Done 46 out of 46 | elapsed: 2.7s finished [2025-11-17 04:43:13] Features: 212/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.5s [Parallel(n_jobs=-1)]: Done 45 out of 45 | elapsed: 2.4s finished [2025-11-17 04:43:16] Features: 213/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.5s [Parallel(n_jobs=-1)]: Done 44 out of 44 | elapsed: 2.4s finished [2025-11-17 04:43:18] Features: 214/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.5s [Parallel(n_jobs=-1)]: Done 43 out of 43 | elapsed: 2.4s finished [2025-11-17 04:43:21] Features: 215/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.5s [Parallel(n_jobs=-1)]: Done 42 out of 42 | elapsed: 2.4s finished [2025-11-17 04:43:23] Features: 216/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.8s [Parallel(n_jobs=-1)]: Done 41 out of 41 | elapsed: 2.6s finished [2025-11-17 04:43:26] Features: 217/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 1.5s [Parallel(n_jobs=-1)]: Done 40 out of 40 | elapsed: 2.2s finished [2025-11-17 04:43:28] Features: 218/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 39 out of 39 | elapsed: 2.2s finished [2025-11-17 04:43:30] Features: 219/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 38 out of 38 | elapsed: 2.0s finished [2025-11-17 04:43:32] Features: 220/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 37 out of 37 | elapsed: 2.1s finished [2025-11-17 04:43:34] Features: 221/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 36 out of 36 | elapsed: 2.2s finished [2025-11-17 04:43:36] Features: 222/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 35 out of 35 | elapsed: 2.0s finished [2025-11-17 04:43:38] Features: 223/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 out of 34 | elapsed: 1.9s finished [2025-11-17 04:43:40] Features: 224/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 33 out of 33 | elapsed: 1.9s finished [2025-11-17 04:43:42] Features: 225/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 32 out of 32 | elapsed: 1.9s finished [2025-11-17 04:43:44] Features: 226/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 31 out of 31 | elapsed: 1.8s finished [2025-11-17 04:43:46] Features: 227/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 30 out of 30 | elapsed: 1.8s finished [2025-11-17 04:43:48] Features: 228/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 29 out of 29 | elapsed: 1.8s finished [2025-11-17 04:43:49] Features: 229/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 28 out of 28 | elapsed: 1.7s finished [2025-11-17 04:43:51] Features: 230/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 27 out of 27 | elapsed: 1.7s finished [2025-11-17 04:43:53] Features: 231/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 26 out of 26 | elapsed: 1.5s finished [2025-11-17 04:43:54] Features: 232/257 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 23 out of 25 | elapsed: 1.4s remaining: 0.1s [Parallel(n_jobs=-1)]: Done 25 out of 25 | elapsed: 1.4s finished [2025-11-17 04:43:56] Features: 233/257 -- score: 0.7581978773668065[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 22 out of 24 | elapsed: 1.4s remaining: 0.1s [Parallel(n_jobs=-1)]: Done 24 out of 24 | elapsed: 1.4s finished [2025-11-17 04:43:57] Features: 234/257 -- score: 0.7581978773668065[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 20 out of 23 | elapsed: 1.3s remaining: 0.2s [Parallel(n_jobs=-1)]: Done 23 out of 23 | elapsed: 1.4s finished [2025-11-17 04:43:59] Features: 235/257 -- score: 0.7591044514546091[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 19 out of 22 | elapsed: 1.4s remaining: 0.2s [Parallel(n_jobs=-1)]: Done 22 out of 22 | elapsed: 1.5s finished [2025-11-17 04:44:00] Features: 236/257 -- score: 0.7591044514546091[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 17 out of 21 | elapsed: 1.4s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 21 out of 21 | elapsed: 1.5s finished [2025-11-17 04:44:02] Features: 237/257 -- score: 0.7591044514546091[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 16 out of 20 | elapsed: 1.2s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 20 out of 20 | elapsed: 1.3s finished [2025-11-17 04:44:03] Features: 238/257 -- score: 0.7591044514546091[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 14 out of 19 | elapsed: 1.1s remaining: 0.4s [Parallel(n_jobs=-1)]: Done 19 out of 19 | elapsed: 1.3s finished [2025-11-17 04:44:04] Features: 239/257 -- score: 0.7591044514546091[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 13 out of 18 | elapsed: 1.1s remaining: 0.4s [Parallel(n_jobs=-1)]: Done 18 out of 18 | elapsed: 1.2s finished [2025-11-17 04:44:05] Features: 240/257 -- score: 0.7591044514546091[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 11 out of 17 | elapsed: 1.1s remaining: 0.6s [Parallel(n_jobs=-1)]: Done 17 out of 17 | elapsed: 1.3s finished [2025-11-17 04:44:07] Features: 241/257 -- score: 0.7580502809406049[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 10 out of 16 | elapsed: 0.9s remaining: 0.5s [Parallel(n_jobs=-1)]: Done 16 out of 16 | elapsed: 1.1s finished [2025-11-17 04:44:08] Features: 242/257 -- score: 0.7580502809406049[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 8 out of 15 | elapsed: 0.7s remaining: 0.6s [Parallel(n_jobs=-1)]: Done 15 out of 15 | elapsed: 1.1s finished [2025-11-17 04:44:09] Features: 243/257 -- score: 0.7573789794645132[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 7 out of 14 | elapsed: 0.7s remaining: 0.7s [Parallel(n_jobs=-1)]: Done 14 out of 14 | elapsed: 1.0s finished [2025-11-17 04:44:10] Features: 244/257 -- score: 0.7561885741314973[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 5 out of 13 | elapsed: 0.6s remaining: 0.9s [Parallel(n_jobs=-1)]: Done 13 out of 13 | elapsed: 1.0s finished [2025-11-17 04:44:11] Features: 245/257 -- score: 0.7542244053320211[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 4 out of 12 | elapsed: 0.7s remaining: 1.4s [Parallel(n_jobs=-1)]: Done 12 out of 12 | elapsed: 1.0s finished [2025-11-17 04:44:12] Features: 246/257 -- score: 0.7512345679012346[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 2 out of 11 | elapsed: 0.8s remaining: 3.6s [Parallel(n_jobs=-1)]: Done 8 out of 11 | elapsed: 0.9s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 11 out of 11 | elapsed: 1.2s finished [2025-11-17 04:44:13] Features: 247/257 -- score: 0.7495013583685821[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 7 out of 10 | elapsed: 0.7s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 10 out of 10 | elapsed: 0.9s finished [2025-11-17 04:44:14] Features: 248/257 -- score: 0.7492948056415549[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 4 out of 9 | elapsed: 0.6s remaining: 0.7s [Parallel(n_jobs=-1)]: Done 9 out of 9 | elapsed: 0.9s finished [2025-11-17 04:44:15] Features: 249/257 -- score: 0.7475615961089024[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 3 out of 8 | elapsed: 0.6s remaining: 0.9s [Parallel(n_jobs=-1)]: Done 8 out of 8 | elapsed: 0.7s finished [2025-11-17 04:44:16] Features: 250/257 -- score: 0.7456097856364297[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 4 out of 7 | elapsed: 0.6s remaining: 0.5s [Parallel(n_jobs=-1)]: Done 7 out of 7 | elapsed: 0.7s finished [2025-11-17 04:44:16] Features: 251/257 -- score: 0.744371784349545[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 3 out of 6 | elapsed: 0.5s remaining: 0.5s [Parallel(n_jobs=-1)]: Done 6 out of 6 | elapsed: 0.6s finished [2025-11-17 04:44:17] Features: 252/257 -- score: 0.7454421551968281[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 2 out of 5 | elapsed: 0.5s remaining: 0.7s [Parallel(n_jobs=-1)]: Done 5 out of 5 | elapsed: 0.6s finished [2025-11-17 04:44:18] Features: 253/257 -- score: 0.7454421551968281[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 4 out of 4 | elapsed: 0.4s finished [2025-11-17 04:44:18] Features: 254/257 -- score: 0.7449224203788098[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 3 out of 3 | elapsed: 0.5s finished [2025-11-17 04:44:19] Features: 255/257 -- score: 0.7411843685958589[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 2 out of 2 | elapsed: 0.4s finished [2025-11-17 04:44:19] Features: 256/257 -- score: 0.7394277615377801[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [2025-11-17 04:44:19] Features: 257/257 -- score: 0.7283814535627934
fig1 = plot_sfs(sfs.get_metric_dict(), kind="std_dev", figsize=(40, 5))
plt.title("Sequential Forward Selection (w. StdDev)")
plt.xticks(rotation=90)
plt.show()
Observation:
Its seen seen that at approx. the 18th feature, the performance start to stay constant, hence let us create a new model with 35 variables only and display the top 18 features
sfs1 = SFS(
model,
k_features=35,
forward=True,
floating=False,
scoring="f1",
verbose=2,
cv=2,
n_jobs=-1,
)
sfs1 = sfs1.fit(X_train, y_train)
fig1 = plot_sfs(sfs1.get_metric_dict(), kind="std_dev", figsize=(10, 5))
plt.title("Sequential Forward Selection (w. StdDev)")
plt.grid()
plt.show()
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.6s [Parallel(n_jobs=-1)]: Done 242 out of 257 | elapsed: 3.1s remaining: 0.2s [Parallel(n_jobs=-1)]: Done 257 out of 257 | elapsed: 3.2s finished [2025-11-17 04:44:24] Features: 1/35 -- score: 0.4104808235243018[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 256 out of 256 | elapsed: 4.0s finished [2025-11-17 04:44:28] Features: 2/35 -- score: 0.5389730931641541[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.7s [Parallel(n_jobs=-1)]: Done 255 out of 255 | elapsed: 3.9s finished [2025-11-17 04:44:32] Features: 3/35 -- score: 0.5878732445359096[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.8s [Parallel(n_jobs=-1)]: Done 239 out of 254 | elapsed: 3.8s remaining: 0.2s [Parallel(n_jobs=-1)]: Done 254 out of 254 | elapsed: 4.1s finished [2025-11-17 04:44:36] Features: 4/35 -- score: 0.7075021174604891[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 253 out of 253 | elapsed: 4.3s finished [2025-11-17 04:44:40] Features: 5/35 -- score: 0.7233252563594652[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.8s [Parallel(n_jobs=-1)]: Done 237 out of 252 | elapsed: 4.0s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 252 out of 252 | elapsed: 4.2s finished [2025-11-17 04:44:44] Features: 6/35 -- score: 0.7376643013945693[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.8s [Parallel(n_jobs=-1)]: Done 236 out of 251 | elapsed: 4.1s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 251 out of 251 | elapsed: 4.3s finished [2025-11-17 04:44:49] Features: 7/35 -- score: 0.7416059055296409[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 235 out of 250 | elapsed: 3.9s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 250 out of 250 | elapsed: 4.1s finished [2025-11-17 04:44:53] Features: 8/35 -- score: 0.7453903774128494[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.8s [Parallel(n_jobs=-1)]: Done 234 out of 249 | elapsed: 4.0s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 249 out of 249 | elapsed: 4.2s finished [2025-11-17 04:44:57] Features: 9/35 -- score: 0.7481993661768942[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.8s [Parallel(n_jobs=-1)]: Done 248 out of 248 | elapsed: 4.4s finished [2025-11-17 04:45:01] Features: 10/35 -- score: 0.7492596605272661[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 232 out of 247 | elapsed: 4.0s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 247 out of 247 | elapsed: 4.2s finished [2025-11-17 04:45:06] Features: 11/35 -- score: 0.7512667056653017[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 246 out of 246 | elapsed: 4.1s finished [2025-11-17 04:45:10] Features: 12/35 -- score: 0.7530154000789748[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 230 out of 245 | elapsed: 4.3s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 245 out of 245 | elapsed: 4.5s finished [2025-11-17 04:45:14] Features: 13/35 -- score: 0.7547542978852884[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.8s [Parallel(n_jobs=-1)]: Done 244 out of 244 | elapsed: 4.2s finished [2025-11-17 04:45:18] Features: 14/35 -- score: 0.75674855929633[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.8s [Parallel(n_jobs=-1)]: Done 228 out of 243 | elapsed: 4.1s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 243 out of 243 | elapsed: 4.3s finished [2025-11-17 04:45:23] Features: 15/35 -- score: 0.757411777229743[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 242 out of 242 | elapsed: 4.4s finished [2025-11-17 04:45:27] Features: 16/35 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 226 out of 241 | elapsed: 4.1s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 241 out of 241 | elapsed: 4.3s finished [2025-11-17 04:45:31] Features: 17/35 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 240 out of 240 | elapsed: 4.5s finished [2025-11-17 04:45:36] Features: 18/35 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 224 out of 239 | elapsed: 4.3s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 239 out of 239 | elapsed: 4.6s finished [2025-11-17 04:45:40] Features: 19/35 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 238 out of 238 | elapsed: 4.3s finished [2025-11-17 04:45:45] Features: 20/35 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 222 out of 237 | elapsed: 4.4s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 237 out of 237 | elapsed: 4.7s finished [2025-11-17 04:45:50] Features: 21/35 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 221 out of 236 | elapsed: 4.3s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 236 out of 236 | elapsed: 4.5s finished [2025-11-17 04:45:54] Features: 22/35 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 220 out of 235 | elapsed: 4.3s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 235 out of 235 | elapsed: 4.5s finished [2025-11-17 04:45:59] Features: 23/35 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 219 out of 234 | elapsed: 4.4s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 234 out of 234 | elapsed: 4.7s finished [2025-11-17 04:46:03] Features: 24/35 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 218 out of 233 | elapsed: 4.2s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 233 out of 233 | elapsed: 4.5s finished [2025-11-17 04:46:08] Features: 25/35 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 232 out of 232 | elapsed: 4.6s finished [2025-11-17 04:46:12] Features: 26/35 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 216 out of 231 | elapsed: 4.3s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 231 out of 231 | elapsed: 4.6s finished [2025-11-17 04:46:17] Features: 27/35 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 230 out of 230 | elapsed: 4.4s finished [2025-11-17 04:46:21] Features: 28/35 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 214 out of 229 | elapsed: 4.4s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 229 out of 229 | elapsed: 4.6s finished [2025-11-17 04:46:26] Features: 29/35 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 228 out of 228 | elapsed: 4.5s finished [2025-11-17 04:46:30] Features: 30/35 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 212 out of 227 | elapsed: 4.3s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 227 out of 227 | elapsed: 4.5s finished [2025-11-17 04:46:35] Features: 31/35 -- score: 0.7584777425897591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 226 out of 226 | elapsed: 5.6s finished [2025-11-17 04:46:40] Features: 32/35 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 210 out of 225 | elapsed: 5.3s remaining: 0.4s [Parallel(n_jobs=-1)]: Done 225 out of 225 | elapsed: 5.5s finished [2025-11-17 04:46:46] Features: 33/35 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 224 out of 224 | elapsed: 4.9s finished [2025-11-17 04:46:51] Features: 34/35 -- score: 0.7592638427268228[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 208 out of 223 | elapsed: 4.2s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 223 out of 223 | elapsed: 4.5s finished [2025-11-17 04:46:55] Features: 35/35 -- score: 0.7592638427268228
More perfectly observed! Displaying the most important feature names.
Model Performance Evaluation - Simpified model (variables = 35)¶
feat_cols = list(sfs1.k_feature_idx_)
print(feat_cols)
X_train.columns[feat_cols]
[2, 3, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 27, 29, 37, 40, 42, 43, 100, 117, 119, 123, 149, 182, 206]
Index(['Income', 'Family', 'Securities_Account', 'CD_Account', 'Online',
'CreditCard', 'Education_2', 'Education_3', 'City_Alameda',
'City_Alamo', 'City_Albany', 'City_Alhambra', 'City_Anaheim',
'City_Antioch', 'City_Aptos', 'City_Arcadia', 'City_Arcata',
'City_Bakersfield', 'City_Baldwin Park', 'City_Banning',
'City_Bella Vista', 'City_Belmont', 'City_Ben Lomond',
'City_Beverly Hills', 'City_Camarillo', 'City_Capistrano Beach',
'City_Cardiff By The Sea', 'City_Carlsbad', 'City_Irvine',
'City_Los Angeles', 'City_Manhattan Beach', 'City_Menlo Park',
'City_Oakland', 'City_Sacramento', 'City_Santa Barbara'],
dtype='object')
X_train_sfs = X_train[X_train.columns[feat_cols]]
# Creating new x_test with the same variables that we selected for x_train
X_test_sfs = X_test[X_train_sfs.columns]
# X_train_sfs = X_train.columns[X_train.columns[feat_cols]]
# X_test_sfs = X_test[X_train_sfs.columns]
print(f'''X_train shape:{X_train_sfs.shape}
Y_test shape:{X_test_sfs.shape}''')
X_train shape:(4000, 35) Y_test shape:(1000, 35)
X_train_sfs
| Income | Family | Securities_Account | CD_Account | Online | CreditCard | Education_2 | Education_3 | City_Alameda | City_Alamo | City_Albany | City_Alhambra | City_Anaheim | City_Antioch | City_Aptos | City_Arcadia | City_Arcata | City_Bakersfield | City_Baldwin Park | City_Banning | City_Bella Vista | City_Belmont | City_Ben Lomond | City_Beverly Hills | City_Camarillo | City_Capistrano Beach | City_Cardiff By The Sea | City_Carlsbad | City_Irvine | City_Los Angeles | City_Manhattan Beach | City_Menlo Park | City_Oakland | City_Sacramento | City_Santa Barbara | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1233 | 22 | 2 | 0 | 0 | 1 | 0 | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False |
| 1056 | 25 | 1 | 0 | 0 | 1 | 0 | False | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False |
| 1686 | 39 | 4 | 1 | 0 | 1 | 0 | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False |
| 187 | 159 | 3 | 0 | 0 | 1 | 0 | False | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False |
| 3840 | 35 | 3 | 0 | 0 | 0 | 0 | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2895 | 39 | 4 | 0 | 0 | 1 | 0 | True | False | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False |
| 2763 | 13 | 4 | 0 | 0 | 1 | 0 | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False |
| 905 | 28 | 1 | 0 | 0 | 1 | 1 | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False |
| 3980 | 89 | 4 | 0 | 0 | 1 | 0 | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False |
| 235 | 71 | 4 | 0 | 0 | 1 | 0 | False | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False |
4000 rows × 35 columns
# Fitting logistic regession model
log_sfs = LogisticRegression(
solver="newton-cg", penalty=None, verbose=True, n_jobs=-1, random_state=1
)
# There are several optimizer, we are using optimizer called as 'newton-cg' with max_iter equal to 10000
# max_iter indicates number of iteration needed to converge
log_sfs.fit(X_train_sfs, y_train)
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
LogisticRegression(n_jobs=-1, penalty=None, random_state=1, solver='newton-cg',
verbose=True)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
LogisticRegression(n_jobs=-1, penalty=None, random_state=1, solver='newton-cg',
verbose=True)Model performance on training set¶
confusion_matrix_sklearn_with_threshold(log_sfs, X_train_sfs, y_train)
log_reg_model_train_perf_SFS = model_performance_classification_sklearn_with_threshold(
log_sfs, X_train_sfs, y_train
)
print("Training performance:")
log_reg_model_train_perf_SFS
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.96275 | 0.7 | 0.883721 | 0.781204 |
Model performance on test set¶
confusion_matrix_sklearn_with_threshold(log_sfs, X_test_sfs, y_test)
log_reg_model_test_perf_SFS = model_performance_classification_sklearn_with_threshold(
log_sfs, X_test_sfs, y_test
)
print("Test set performance:")
log_reg_model_test_perf_SFS
Test set performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.953 | 0.66 | 0.835443 | 0.73743 |
The Recall is much less than after applying the feature selection with the default threshold 0.5 , hence we will derive another model with a threshold 0.1 and observe its performance
log_reg_model_train_perf_SFS_tre = model_performance_classification_sklearn_with_threshold(
log_sfs, X_train_sfs, y_train, threshold=0.1
)
print("Training performance:")
log_reg_model_train_perf_SFS_tre
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.9035 | 0.889474 | 0.495601 | 0.636535 |
log_reg_model_test_perf_SFS_tre = model_performance_classification_sklearn_with_threshold(
log_sfs, X_test_sfs, y_test, threshold=0.1
)
print("Test set performance:")
log_reg_model_test_perf_SFS_tre
Test set performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.895 | 0.84 | 0.485549 | 0.615385 |
The Recall of the train and test sets are highly mismatched, where recall train = 0.8894 and recall test = 0.64
Logistic Regression Models Comparison:
Sr Model name Train_performance recall(train). Test_performance. recall(test). Threshold Var
Model name -lg. Recall(train) 0.6894. Recall(test) 0.6500. Threshold - 0.50 No of Variables - 257
Model name -lg Recall(train) 0.9131 Recall(test) 0.8700 Threshold - 0.10 No of Variables - 257
Model name -log_sfs Recall(train) 0.7052 Recall(test) 0.6400 Threshold - 0.50 No of Variables - 18
Model name -log_sfs Recall(train) 0.8894 Recall(test) 0.6400 Threshold - 0.10 No of Variables - 18
Observation: The best performing model is Model_2 lg model tuned to a threshold of 0.10 and uses 257 variables
Conclusion:
- Low threshold yields a good model predictibility and its explained by the low percentage of class 1 customers (who accept personal loan) in the original data set.
This model will be used in the final comparison between Decision tree models
#function to plot a decision tree
def plot_tree(model, Predictor):
feature_names = Predictor.columns.to_list()
plt.figure(figsize=(20, 30))
out = tree.plot_tree(
model,
feature_names=feature_names,
filled=True,
fontsize=9,
node_ids=False,
class_names=None,
)
# below code will add arrows to the decision tree split if they are missing
for o in out:
arrow = o.arrow_patch
if arrow is not None:
arrow.set_edgecolor("black")
arrow.set_linewidth(1)
plt.show()
Building the original tree T_o
The starting point is building the original full tree model with the default hyperparameters and observe:
Model performance
Variables importance
Model Improvement strategy
Yet, It was noted that the frequence of classes in training set is:
Class Frequence
0 - 0.905429
1 - 0.094571
#creating the decission tree model
t_0 = DecisionTreeClassifier(criterion="gini", class_weight={0: 0.094571, 1: 0.905429}, random_state=1)
#fitting the training data
t_0.fit(X_train, y_train)
DecisionTreeClassifier(class_weight={0: 0.094571, 1: 0.905429}, random_state=1)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
DecisionTreeClassifier(class_weight={0: 0.094571, 1: 0.905429}, random_state=1)Model performance Evaluation of T_o
#Calculating the Recall for train and test data
Recall_Train_T_0 = get_recall_score(t_0, X_train, y_train)
print(f'Recall for T_0 on Train Data = {Recall_Train_T_0}')
Recall_Test_T_0 = get_recall_score(t_0, X_test, y_test)
print(f'Recall for T_0 on Test Data = {Recall_Test_T_0}')
Recall for T_0 on Train Data = 1.0 Recall for T_0 on Test Data = 0.85
Mismatch is observed between train and test sets performance, it is assumed that the T_0 model is overfitting the data. let us observe further the confusion matrix and the tree structure.
confusion_matrix_sklearn(t_0, X_train, y_train)
As assumed, the model is perfectly overfitting the data as observed from the confusion matric the FN & FP are 0%
#plotting the tree
plot_tree(t_0,X_train)
Observation
The original tree T_o is seen to be complicated and is overfitting into the training data set, pre-prunning and post pruning are to be considered to improve the model performance.
The first split on the T_o was on the "Income" variable, let us observe first how this variable importances look like
# importance of features in the tree building (The importance of a feature is computed as the
# (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance )
def view_nd_plot_importance(model, predictors):
print("The features importances:")
print(
pd.DataFrame(
model.feature_importances_, columns=["Imp"], index=predictors.columns
).sort_values(by="Imp", ascending=False)
)
importances = model.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(12, 65))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [predictors.columns[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
view_nd_plot_importance(t_0, X_train)
The features importances:
Imp
Income 0.635549
Education_2 0.143484
CCAvg 0.085613
Education_3 0.048952
Family 0.041663
... ...
City_Torrance 0.000000
City_Trinity Center 0.000000
City_Tustin 0.000000
City_Ukiah 0.000000
State_CA 0.000000
[257 rows x 1 columns]
It is observed that the top importance variable is Income and the least importance are the cities (except for Los Angles that contributes slightly to the model predictibility) The top 5 variables are :
Income
Education_2
CCAvg
Education_3
Family
#let us get the max depth of T_0 to have an idea how to tune our parameters
print(f'''The max depth of the t_0 = {t_0.tree_.max_depth}
The node_count ={t_0.tree_.node_count}
The number of leaves = {t_0.tree_.n_leaves}''')
The max depth of the t_0 = 20 The node_count =179 The number of leaves = 90
# Choose the type of classifier.
t_grid = DecisionTreeClassifier(random_state=1, class_weight={0: 0.094571, 1: 0.905429})
# Grid of parameters to choose from
parameters = {
"max_depth": [5, 10, 15, 20, None],
"criterion": ["entropy", "gini"],
"splitter": ["best", "random"],
'min_samples_leaf': [1, 2, 5, 7, 10,15,20],
'max_leaf_nodes' : [2, 3, 5, 10],
"min_impurity_decrease": [0.00001, 0.0001, 0.01],
}
# Type of scoring used to compare parameter combinations
scorer = make_scorer(recall_score)
# Run the grid search
grid_obj = GridSearchCV(t_grid, parameters, scoring=scorer, cv=5)
grid_obj = grid_obj.fit(X_train, y_train)
# Set the clf to the best combination of parameters
t_grid = grid_obj.best_estimator_
# Fit the best algorithm to the data.
t_grid.fit(X_train, y_train)
DecisionTreeClassifier(class_weight={0: 0.094571, 1: 0.905429},
criterion='entropy', max_depth=5, max_leaf_nodes=3,
min_impurity_decrease=1e-05, random_state=1)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
DecisionTreeClassifier(class_weight={0: 0.094571, 1: 0.905429},
criterion='entropy', max_depth=5, max_leaf_nodes=3,
min_impurity_decrease=1e-05, random_state=1)Model Performance Evaluation_Pre-pruned Tree T_1
# Choose the type of classifier.
t_1 = DecisionTreeClassifier(random_state=1,max_depth=5, criterion='entropy'
, class_weight={0: 0.094571, 1: 0.905429})
t_1.fit(X_train,y_train)
DecisionTreeClassifier(class_weight={0: 0.094571, 1: 0.905429},
criterion='entropy', max_depth=5, random_state=1)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
DecisionTreeClassifier(class_weight={0: 0.094571, 1: 0.905429},
criterion='entropy', max_depth=5, random_state=1)Model Performance Evaluation_Pre-pruned Tree T_1
Recall_Train_T_1 = get_recall_score(t_1, X_train, y_train)
print(f'Recall for T_1 on Train Data = {Recall_Train_T_1}')
Recall_Test_T_1 = get_recall_score(t_1, X_test, y_test)
print(f'Recall for T_1 on Test Data = {Recall_Test_T_1}')
Recall for T_1 on Train Data = 0.9921052631578947 Recall for T_1 on Test Data = 0.92
Displaying the tree, the confucion matrix and variables importance
confusion_matrix_sklearn(t_1, X_train, y_train)
plot_tree(t_1,X_train)
view_nd_plot_importance(t_1, X_train)
The features importances:
Imp
Income 0.620903
Education_2 0.137543
CCAvg 0.116768
Family 0.057203
Education_3 0.054307
... ...
City_Hermosa Beach 0.000000
City_Highland 0.000000
City_Hollister 0.000000
City_Hopland 0.000000
City_Fresno 0.000000
[257 rows x 1 columns]
Observation
At max_depth=5. criterion= entropy and default values for the remaining hyperparameters the model performance on the test set is better than at at max_depth= None criterion = gini
Recall values: Recall for T_1 on Train Data = 0.9921052631578947
Recall for T_1 on Test Data = 0.92000000000000000
Features with max importance: Income 6.20903e-01
Education_2 1.37543-01
Less importance yet still having a predictibily effect: CCAvg 1.16768-01
Family 5.7203e-02
Education_3 5.4307-02
Confusion matrix: -
FN at 0.07% -
FP at 3.95%
Tuning further hyperparameters to derive model T_2
#Choose the type of classifier.
t_2 = DecisionTreeClassifier(random_state=1,max_depth=5, criterion='entropy'
, class_weight={0: 0.094571, 1: 0.905429},max_leaf_nodes=3,
min_impurity_decrease=1e-05)
t_2.fit(X_train,y_train)
DecisionTreeClassifier(class_weight={0: 0.094571, 1: 0.905429},
criterion='entropy', max_depth=5, max_leaf_nodes=3,
min_impurity_decrease=1e-05, random_state=1)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
DecisionTreeClassifier(class_weight={0: 0.094571, 1: 0.905429},
criterion='entropy', max_depth=5, max_leaf_nodes=3,
min_impurity_decrease=1e-05, random_state=1)Model Performance Evaluation_Pre-pruned Tree T_2
Recall_Train_T_2 = get_recall_score(t_2, X_train, y_train)
print(f'Recall for T_2 on Train Data = {Recall_Train_T_1}')
Recall_Test_T_1 = get_recall_score(t_2, X_test, y_test)
print(f'Recall for T_2 on Test Data = {Recall_Test_T_1}')
Recall for T_2 on Train Data = 0.9921052631578947 Recall for T_2 on Test Data = 1.0
The feature importance of this model is:
Features with max importance:
Income 6.221524e-01
Education_2 1.290525e-01
Less importance yet still having a predictibily effect:
CCAvg 1.221962e-01
Family 5.805141e-02
Education_3 5.305620e-02
Post Prunning¶
#defining the classifier
PP_t_0 = DecisionTreeClassifier(random_state=1, class_weight={0: 0.094571, 1: 0.905429})
#defining the cost complexity pryning path
path = PP_t_0.cost_complexity_pruning_path(X_train, y_train)
#Extracting the ccp_aplhas and impurities from the path
ccp_alphas, impurities = path.ccp_alphas, path.impurities
#displaying the ccp_alphas VS the impurities to prove that as the alphase increas the impurities increase
pd.DataFrame(path)
| ccp_alphas | impurities | |
|---|---|---|
| 0 | 0.000000e+00 | -1.221160e-14 |
| 1 | 1.346069e-18 | -1.221025e-14 |
| 2 | 1.346069e-18 | -1.220891e-14 |
| 3 | 1.927326e-18 | -1.220698e-14 |
| 4 | 2.141473e-18 | -1.220484e-14 |
| 5 | 2.692138e-18 | -1.220215e-14 |
| 6 | 3.212210e-18 | -1.219894e-14 |
| 7 | 4.129984e-18 | -1.219481e-14 |
| 8 | 7.648118e-18 | -1.218716e-14 |
| 9 | 7.648118e-18 | -1.217951e-14 |
| 10 | 7.908154e-18 | -1.217160e-14 |
| 11 | 2.489602e-17 | -1.214670e-14 |
| 12 | 1.077161e-16 | -1.203899e-14 |
| 13 | 1.265304e-16 | -1.191246e-14 |
| 14 | 1.364192e-04 | 5.456769e-04 |
| 15 | 1.366781e-04 | 8.190331e-04 |
| 16 | 2.521427e-04 | 2.836175e-03 |
| 17 | 2.530267e-04 | 5.113415e-03 |
| 18 | 2.555348e-04 | 5.880019e-03 |
| 19 | 2.618762e-04 | 6.665648e-03 |
| 20 | 2.649322e-04 | 7.725377e-03 |
| 21 | 2.662816e-04 | 7.991658e-03 |
| 22 | 2.715014e-04 | 8.263160e-03 |
| 23 | 3.440114e-04 | 9.639205e-03 |
| 24 | 4.585411e-04 | 1.055629e-02 |
| 25 | 4.821981e-04 | 1.103849e-02 |
| 26 | 5.146570e-04 | 1.155314e-02 |
| 27 | 5.704040e-04 | 1.269395e-02 |
| 28 | 5.918556e-04 | 1.387766e-02 |
| 29 | 6.537767e-04 | 1.453144e-02 |
| 30 | 8.471994e-04 | 1.707304e-02 |
| 31 | 1.030111e-03 | 1.810315e-02 |
| 32 | 1.030872e-03 | 1.913402e-02 |
| 33 | 1.139805e-03 | 2.027382e-02 |
| 34 | 1.563743e-03 | 2.183757e-02 |
| 35 | 1.567932e-03 | 2.497343e-02 |
| 36 | 1.647632e-03 | 2.662106e-02 |
| 37 | 2.012975e-03 | 3.064701e-02 |
| 38 | 2.551760e-03 | 3.575053e-02 |
| 39 | 2.683713e-03 | 3.843425e-02 |
| 40 | 2.819703e-03 | 4.407365e-02 |
| 41 | 3.163018e-03 | 4.723667e-02 |
| 42 | 3.251437e-03 | 5.373954e-02 |
| 43 | 4.793461e-03 | 6.811993e-02 |
| 44 | 2.088058e-02 | 8.900051e-02 |
| 45 | 3.825469e-02 | 2.037646e-01 |
| 46 | 2.962323e-01 | 4.999969e-01 |
#let us plot the alphas VS the impurities
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(ccp_alphas[:-1], impurities[:-1], marker="o", drawstyle="steps-post")
ax.set_xlabel("effective alpha")
ax.set_ylabel("total impurity of leaves")
ax.set_title("Total Impurity vs effective alpha for training set")
plt.show()
Observation:
It is observed that the impurities show a sdeen peak after alpha = approx.0.005
Now train the decision tree using the effective alphas and observe how the tree depth vary with alpha
#an emplty list of post pruned trees (PP_trees)
PP_trees = []
for alpha in ccp_alphas:
PP_tree = DecisionTreeClassifier(
random_state=1, ccp_alpha=alpha, class_weight={0: 0.094571, 1: 0.905429}
)
PP_tree.fit(X_train, y_train)
PP_trees.append(PP_tree)
print(
"Number of nodes in the last tree is: {} with ccp_alpha: {}".format(
PP_trees[-1].tree_.node_count, ccp_alphas[-1]
)
)
Number of nodes in the last tree is: 1 with ccp_alpha: 0.2962323013249329
Action:
Remove the last elements in PP_trees and CCP_alphas as they relfect the smallest tree (one node) and proceed with the visualtization of the nodes and depth (i.e tree complexity) as alpha varies
PP_trees = PP_trees[:-1]
ccp_alphas = ccp_alphas[:-1]
node_counts = [PP_tree.tree_.node_count for PP_tree in PP_trees]
depth = [PP_tree.tree_.max_depth for PP_tree in PP_trees]
fig, ax = plt.subplots(2, 1, figsize=(10, 7))
ax[0].plot(ccp_alphas, node_counts, marker="o", drawstyle="steps-post")
ax[0].set_xlabel("alpha")
ax[0].set_ylabel("number of nodes")
ax[0].set_title("Number of nodes vs alpha")
ax[1].plot(ccp_alphas, depth, marker="o", drawstyle="steps-post")
ax[1].set_xlabel("alpha")
ax[1].set_ylabel("depth of tree")
ax[1].set_title("Depth vs alpha")
fig.tight_layout()
***Observation**
At alpha = 0.005, the tree have reached the smallest size which is underfitting the data.
We now have an idea that the alpha value that will give us the optimum model performance is below 0.05
Observe how the Model recall values will vary based on varying alphas for the training and test sets
#derive the recall values for all PP_tress for the train set
recall_train = []
for PP_tree in PP_trees:
y_pred_train = PP_tree.predict(X_train)
values_train = recall_score(y_train, y_pred_train)
recall_train.append(values_train)
#derive the recall values for all PP_tress for the test set
recall_test = []
for PP_tree in PP_trees:
y_pred_test = PP_tree.predict(X_test)
values_test = recall_score(y_test, y_pred_test)
recall_test.append(values_test)
#calculating the Accuracy of test and train models
train_scores = [PP_tree.score(X_train, y_train) for PP_tree in PP_trees]
test_scores = [PP_tree.score(X_test, y_test) for PP_tree in PP_trees]
#plotting the accuracy for test and training sets
fig, ax = plt.subplots(figsize=(15, 5))
ax.set_xlabel("alpha")
ax.set_ylabel("Accuracy")
ax.set_title("Accuracy vs alpha for training and testing sets")
ax.plot(
ccp_alphas, train_scores, marker="o", label="train", drawstyle="steps-post",
)
ax.plot(ccp_alphas, test_scores, marker="o", label="test", drawstyle="steps-post")
ax.legend()
plt.show()
Observation
Observation better performance at alpha less than 0.05
The alpha is giving the best value close to zero which could be still reflecting an over fitting tree
The other option is at approx 0.03 or 0.035. Still, the accuracy is not the optimum performance measurement we need, so we must find the best Recall value as a performance measurement
#plotting the recall scores for test and training sets VS Alpha
fig, ax = plt.subplots(figsize=(15, 5))
ax.set_xlabel("alpha")
ax.set_ylabel("Recall")
ax.set_title("Recall vs alpha for training and testing sets")
ax.plot(
ccp_alphas, recall_train, marker="o", label="train", drawstyle="steps-post",
)
ax.plot(ccp_alphas, recall_test, marker="o", label="test", drawstyle="steps-post")
ax.legend()
plt.show()
#let us derive the minimum alpha value for the test test
index_best_model = np.argmax(recall_test)
print(f''' The recall value giving the best predictibility model is: {recall_test[index_best_model]}
"The best alpha value is: {ccp_alphas[index_best_model]}"''')
The recall value giving the best predictibility model is: 0.98 "The best alpha value is: 0.002683713283947349"
Deriving the best model PP_t_best and fitting it to the train set
PP_t_best_1 = PP_trees[index_best_model]
PP_t_best_1.fit(X_train, y_train)
DecisionTreeClassifier(ccp_alpha=np.float64(0.002683713283947349),
class_weight={0: 0.094571, 1: 0.905429}, random_state=1)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
DecisionTreeClassifier(ccp_alpha=np.float64(0.002683713283947349),
class_weight={0: 0.094571, 1: 0.905429}, random_state=1)Model Performance Evaluation_Post-pruned Tree¶
#Calculating the Recall for train and test data
Recall_Train_PP_t_best_1 = get_recall_score(PP_t_best_1, X_train, y_train)
print(f'Recall for PP_t_best_1 on Train Data = {Recall_Train_PP_t_best_1}')
Recall_Test_PP_t_best_1 = get_recall_score(PP_t_best_1, X_test, y_test)
print(f'Recall for PP_t_best_1 on Test Data = {Recall_Test_PP_t_best_1}')
Recall for PP_t_best_1 on Train Data = 1.0 Recall for PP_t_best_1 on Test Data = 0.98
Observation
The Recall on train data sets is 1.0
Since its value is closer to the recall of the test dataset which is still better than t_0
It would require further improvement
Derive the tree, decision table and feature importantce for this model PP_t_best_1
Then derive another model of the second peak of alpha.
#plotting the tree, variables importance and the confusion matrix
print(f'The tree depth is : {PP_t_best_1.tree_.max_depth}')
confusion_matrix_sklearn(PP_t_best_1, X_train, y_train)
plot_tree(PP_t_best_1,X_train)
view_nd_plot_importance(PP_t_best_1, X_train)
The tree depth is : 6
The features importances:
Imp
Income 0.671525
Education_2 0.155432
CCAvg 0.079833
Education_3 0.052849
Family 0.040362
... ...
City_Garden Grove 0.000000
City_Fullerton 0.000000
City_Fresno 0.000000
City_Fremont 0.000000
City_Greenbrae 0.000000
[257 rows x 1 columns]
Observation on PP_t_best_1 Model
- At ccp_alpha=0.002683713283947349 and max_depth=6 the performance summary is:
a. Recall values:
i. Recall for PP_t_best_1 on Train Data = 1.0
ii. Recall for PP_t_best_1 on Test Data = 0.98
b. Features with max importance: (a) Income (b) Education_2
i. Less importance yet still having a predictibily effect: (a) CCAvg (b) Education_3 (c)Family
c. Confusion matrix:
i. FN at 0.00%
ii. FP at 4.62%
#creating a data frame including alpha, recall train and recall test
df = pd.DataFrame()
df[['ccp_alphas', 'recall_train', 'recall_test']]=''
df['ccp_alphas']=ccp_alphas
df['recall_train']= recall_train
df['recall_test']=recall_test
df
| ccp_alphas | recall_train | recall_test | |
|---|---|---|---|
| 0 | 0.000000e+00 | 1.000000 | 0.85 |
| 1 | 1.346069e-18 | 1.000000 | 0.85 |
| 2 | 1.346069e-18 | 1.000000 | 0.85 |
| 3 | 1.927326e-18 | 1.000000 | 0.85 |
| 4 | 2.141473e-18 | 1.000000 | 0.85 |
| 5 | 2.692138e-18 | 1.000000 | 0.85 |
| 6 | 3.212210e-18 | 1.000000 | 0.85 |
| 7 | 4.129984e-18 | 1.000000 | 0.85 |
| 8 | 7.648118e-18 | 1.000000 | 0.85 |
| 9 | 7.648118e-18 | 1.000000 | 0.85 |
| 10 | 7.908154e-18 | 1.000000 | 0.85 |
| 11 | 2.489602e-17 | 1.000000 | 0.85 |
| 12 | 1.077161e-16 | 1.000000 | 0.85 |
| 13 | 1.265304e-16 | 1.000000 | 0.85 |
| 14 | 1.364192e-04 | 1.000000 | 0.85 |
| 15 | 1.366781e-04 | 1.000000 | 0.85 |
| 16 | 2.521427e-04 | 1.000000 | 0.85 |
| 17 | 2.530267e-04 | 1.000000 | 0.85 |
| 18 | 2.555348e-04 | 1.000000 | 0.87 |
| 19 | 2.618762e-04 | 1.000000 | 0.88 |
| 20 | 2.649322e-04 | 1.000000 | 0.88 |
| 21 | 2.662816e-04 | 1.000000 | 0.88 |
| 22 | 2.715014e-04 | 1.000000 | 0.88 |
| 23 | 3.440114e-04 | 1.000000 | 0.89 |
| 24 | 4.585411e-04 | 1.000000 | 0.89 |
| 25 | 4.821981e-04 | 1.000000 | 0.89 |
| 26 | 5.146570e-04 | 1.000000 | 0.89 |
| 27 | 5.704040e-04 | 1.000000 | 0.89 |
| 28 | 5.918556e-04 | 1.000000 | 0.90 |
| 29 | 6.537767e-04 | 1.000000 | 0.90 |
| 30 | 8.471994e-04 | 1.000000 | 0.92 |
| 31 | 1.030111e-03 | 1.000000 | 0.92 |
| 32 | 1.030872e-03 | 1.000000 | 0.92 |
| 33 | 1.139805e-03 | 1.000000 | 0.93 |
| 34 | 1.563743e-03 | 1.000000 | 0.93 |
| 35 | 1.567932e-03 | 1.000000 | 0.94 |
| 36 | 1.647632e-03 | 1.000000 | 0.94 |
| 37 | 2.012975e-03 | 1.000000 | 0.94 |
| 38 | 2.551760e-03 | 1.000000 | 0.95 |
| 39 | 2.683713e-03 | 1.000000 | 0.98 |
| 40 | 2.819703e-03 | 0.992105 | 0.96 |
| 41 | 3.163018e-03 | 0.992105 | 0.97 |
| 42 | 3.251437e-03 | 0.992105 | 0.97 |
| 43 | 4.793461e-03 | 0.992105 | 0.97 |
| 44 | 2.088058e-02 | 0.950000 | 0.88 |
| 45 | 3.825469e-02 | 0.957895 | 0.91 |
At row 42 above, it is observed that recall values for train and test at a value of alpha near the 0.003 threshold we had seen earlier on the curve.
This value seems a good fit as it is not too close to an underfitting model, hence we will extract this value and observe how it behaves on the tree and confusion matrix
df.iloc[42]
| 42 | |
|---|---|
| ccp_alphas | 0.003251 |
| recall_train | 0.992105 |
| recall_test | 0.970000 |
#extracting the model from the 42nd model and fitting it to train and test data
PP_t_best_2 = PP_trees[42]
PP_t_best_2.fit(X_train, y_train)
DecisionTreeClassifier(ccp_alpha=np.float64(0.003251436542468189),
class_weight={0: 0.094571, 1: 0.905429}, random_state=1)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
DecisionTreeClassifier(ccp_alpha=np.float64(0.003251436542468189),
class_weight={0: 0.094571, 1: 0.905429}, random_state=1)Model Performance Evaluation of Post-pruned Tree PP_t_best_2¶
#Calculating the Recall for train and test data
Recall_Train_PP_t_best_2 = get_recall_score(PP_t_best_2, X_train, y_train)
print(f'Recall for PP_t_best_2 on Train Data = {Recall_Train_PP_t_best_2}')
Recall_Test_PP_t_best_2 = get_recall_score(PP_t_best_2, X_test, y_test)
print(f'Recall for PP_t_best_2 on Test Data = {get_recall_score(PP_t_best_2, X_test, y_test)}')
Recall for PP_t_best_2 on Train Data = 0.9921052631578947 Recall for PP_t_best_2 on Test Data = 0.97
Observation
The Recall on test and train data sets are almost the same as stated in the dataframe which was expected
Now visualizing the tree, the confusion matrix and feature importance
#plotting the tree, variables importance and the confusion matrix
print(f'The tree depth is : {PP_t_best_2.tree_.max_depth}')
confusion_matrix_sklearn(PP_t_best_2, X_train, y_train)
plot_tree(PP_t_best_2,X_train)
view_nd_plot_importance(PP_t_best_2, X_train)
The tree depth is : 5
The features importances:
Imp
Income 0.684178
Education_2 0.160763
CCAvg 0.058652
Education_3 0.054661
Family 0.041746
... ...
City_Garden Grove 0.000000
City_Fullerton 0.000000
City_Fresno 0.000000
City_Fremont 0.000000
City_Greenbrae 0.000000
[257 rows x 1 columns]
Observation on PP_t_best_2 Model
- At ccp_alpha=0.003504 and max_depth=6 the performance of PP_t_best_2 is better then PP_t_best_1. Although the recall values for train and test are less, the FN% in the confusion matrix is down to half. The performance summary is:
a. Recall values:
i. Recall for PP_t_best_2 on Train Data = 0.9921052631578947
ii. Recall for PP_t_best_2 on Test Data = 0.97
b. Features with max importance:
i. Income 0.684178
ii. Education_2 0.160763
c. Less importance yet still having a predictibily effect:
i. CCAvg 0.058652
ii. Education_3 0.054661
iii. Family 0.041746
Confusion matrix:
FN at 0.07%
FP at 5.88%
Model Performance Evaluation_Post-pruned Tree PP_t_best_3¶
df.iloc[42]
| 42 | |
|---|---|
| ccp_alphas | 0.003251 |
| recall_train | 0.992105 |
| recall_test | 0.970000 |
#extracting the model from the 42nd model and fitting it to train and test data
PP_t_best_3 = PP_trees[42]
PP_t_best_3.fit(X_train, y_train)
DecisionTreeClassifier(ccp_alpha=np.float64(0.003251436542468189),
class_weight={0: 0.094571, 1: 0.905429}, random_state=1)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
DecisionTreeClassifier(ccp_alpha=np.float64(0.003251436542468189),
class_weight={0: 0.094571, 1: 0.905429}, random_state=1)#Calculating the Recall for train and test data
Recall_Train_PP_t_best_3 = get_recall_score(PP_t_best_3, X_train, y_train)
print(f'Recall for PP_t_best_3 on Train Data = {Recall_Train_PP_t_best_3}')
Recall_Test_PP_t_best_3 = get_recall_score(PP_t_best_3, X_test, y_test)
print(f'Recall for PP_t_best_3 on Test Data = {Recall_Test_PP_t_best_3}')
Recall for PP_t_best_3 on Train Data = 0.9921052631578947 Recall for PP_t_best_3 on Test Data = 0.97
The Recall on test and train data sets are looks good
Visualizing the tree, the confusion matrix and feature importance
#plotting the tree, variables importance and the confusion matrix
print(f'The tree depth is : {PP_t_best_3.tree_.max_depth}')
confusion_matrix_sklearn(PP_t_best_3, X_train, y_train)
plot_tree(PP_t_best_3,X_train)
view_nd_plot_importance(PP_t_best_3, X_train)
The tree depth is : 5
The features importances:
Imp
Income 0.684178
Education_2 0.160763
CCAvg 0.058652
Education_3 0.054661
Family 0.041746
... ...
City_Garden Grove 0.000000
City_Fullerton 0.000000
City_Fresno 0.000000
City_Fremont 0.000000
City_Greenbrae 0.000000
[257 rows x 1 columns]
Observation on PP_t_best_3 Model
At ccp_alpha=0.003251 max_depth=5 the performance of PP_t_best_3 is better then PP_t_best_2.
The performance summary is:
Recall values:
Recall for PP_t_best_3 on Train Data = 0.9921052631578947
Recall for PP_t_best_3 on Test Data = 0.97
Features with max importance:
Income 0.684178
Education_2 0.160763
Less importance yet still having a predictibily effect:
CCAvg 0.058652
Education_3 0.054661
Family 0.041746
Confusion matrix:
FN at 0.07%
FP at 5.88%
Decision Tree Conclusion
- The third and final model has a low tree depth of 5, hence less tree complexity of the tree using a greater value of alpha and it was successful in avoiding the model overfitting. Hence, The best performing model on the test set is PP_t_best_3 with the below alpha value
At ccp_alpha=0.003251 max_depth=5 the performance of PP_t_best_3 is better then PP_t_best_2.
The performance summary is:
Recall values:
Recall for PP_t_best_3 on Train Data = 0.9921052631578947
Recall for PP_t_best_3 on Test Data = 0.97
Features with max importance:
Income 0.684178
Education_2 0.160763
Less importance yet still having a predictibily effect:
CCAvg 0.058652
Education_3 0.054661
Family 0.041746
Confusion matrix:
FN at 0.07%
FP at 5.88%
Model Comparison and Final Model Selection¶
Observation and conclusion:
At threshold 0.10 the Recall for the test and train data sets is equal to 87% and 91% respectively which is a very good performance by the logistics regression model to minimize the FN on our model to only 1.20% on the test set whilst maintaing a precision value of approximately 50% for both test and train datasets.
Actionable Insights and Business Recommendations¶
Final models comparison:
Modelling Algorithm Model Name
recall (train)
recall (test)
Logistic Regression
lg with threshold 0.1 0.92 0.88
Decision Tree
(Pre Pruned) t_1 0.99 0.95
(Post Pruned) PP_t_best_3 0.99 0.97
Insights:
The best performing model was derived from the Decision Tree Modelling technique where the original tree was post pruned via ccp_alpha=0.0.003251 and gave the below Recall values for test and training data sets:
Recall for PP_t_best_3 on Train Data = 0.9921052631578947 and Recall for PP_t_best_3 on Test Data = 0.97
The statistical evidence show the Features that most affects the client decision to accept a personal loan are listed in below table with priority levels;
Priority Feature Effect on customer
Income- The higher the income, the more chances are that the customer will accept a personal loan
Education_2- Customers with Education level 2 are more willing to accept a personal loan than levels 1 and 3
CCAvg- As the average monthly spending of customers increases, the more the customers are are willing to accept personal loan
Education_3- Customers with Education level 3 are more willing to accept a personal loan than level 1
Family- As familly size grows, customers are more willing to accept personal loan
What recommedations would you suggest to the bank?¶
The marketing department should study customers profiles first before approaching them for a personal loan offer.
AllLife bank should apply various strategies to sell more Personal Loan packages especially by using dedicated relationship managers for high profile customers
The bank should also consider monthly/quarterly follow-up with average to mid profile customers in order to see how to attract more customers to take personal loan
The bank should consider doing a rigorous mail marketting directed to high and mid profile customers so that they will be properly informed about getting pre-approved for personal loans
Income is also seen at the most important feature in decision tree model. So If our customer's yearly income is less than USD92,500, its most likely the customer wouldn't accept a personal loan
Consequently, customers with an income greater than USD92,500 and with an education level greater than or equal to 3 (Advanced/Professional) were most likely to have a personal loan. So a targetted marketting to this group is very important.
Finally, customers using online banking services were more likely to have personal loans. Therefore, the bank should improve on the websites and make it more user-friendly and encourage those customers who don't use the facilities to use the online banking services
Making personal loan application process on the website available on mobile device and reducing online applications to few clicks would be helpful in order to improve customers' user experience when applying for personal loan.
The location of residence of customers does not really have any real impact on their decision whether to accept personal loans or not as long as the the bank can do the things mentioned in points 1-8 above.
!pip install nbconvert
Requirement already satisfied: nbconvert in /usr/local/lib/python3.12/dist-packages (7.16.6) Requirement already satisfied: beautifulsoup4 in /usr/local/lib/python3.12/dist-packages (from nbconvert) (4.13.5) Requirement already satisfied: bleach!=5.0.0 in /usr/local/lib/python3.12/dist-packages (from bleach[css]!=5.0.0->nbconvert) (6.3.0) Requirement already satisfied: defusedxml in /usr/local/lib/python3.12/dist-packages (from nbconvert) (0.7.1) Requirement already satisfied: jinja2>=3.0 in /usr/local/lib/python3.12/dist-packages (from nbconvert) (3.1.6) Requirement already satisfied: jupyter-core>=4.7 in /usr/local/lib/python3.12/dist-packages (from nbconvert) (5.9.1) Requirement already satisfied: jupyterlab-pygments in /usr/local/lib/python3.12/dist-packages (from nbconvert) (0.3.0) Requirement already satisfied: markupsafe>=2.0 in /usr/local/lib/python3.12/dist-packages (from nbconvert) (3.0.3) Requirement already satisfied: mistune<4,>=2.0.3 in /usr/local/lib/python3.12/dist-packages (from nbconvert) (3.1.4) Requirement already satisfied: nbclient>=0.5.0 in /usr/local/lib/python3.12/dist-packages (from nbconvert) (0.10.2) Requirement already satisfied: nbformat>=5.7 in /usr/local/lib/python3.12/dist-packages (from nbconvert) (5.10.4) Requirement already satisfied: packaging in /usr/local/lib/python3.12/dist-packages (from nbconvert) (25.0) Requirement already satisfied: pandocfilters>=1.4.1 in /usr/local/lib/python3.12/dist-packages (from nbconvert) (1.5.1) Requirement already satisfied: pygments>=2.4.1 in /usr/local/lib/python3.12/dist-packages (from nbconvert) (2.19.2) Requirement already satisfied: traitlets>=5.1 in /usr/local/lib/python3.12/dist-packages (from nbconvert) (5.7.1) Requirement already satisfied: webencodings in /usr/local/lib/python3.12/dist-packages (from bleach!=5.0.0->bleach[css]!=5.0.0->nbconvert) (0.5.1) Requirement already satisfied: tinycss2<1.5,>=1.1.0 in /usr/local/lib/python3.12/dist-packages (from bleach[css]!=5.0.0->nbconvert) (1.4.0) Requirement already satisfied: platformdirs>=2.5 in /usr/local/lib/python3.12/dist-packages (from jupyter-core>=4.7->nbconvert) (4.5.0) Requirement already satisfied: jupyter-client>=6.1.12 in /usr/local/lib/python3.12/dist-packages (from nbclient>=0.5.0->nbconvert) (7.4.9) Requirement already satisfied: fastjsonschema>=2.15 in /usr/local/lib/python3.12/dist-packages (from nbformat>=5.7->nbconvert) (2.21.2) Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.12/dist-packages (from nbformat>=5.7->nbconvert) (4.25.1) Requirement already satisfied: soupsieve>1.2 in /usr/local/lib/python3.12/dist-packages (from beautifulsoup4->nbconvert) (2.8) Requirement already satisfied: typing-extensions>=4.0.0 in /usr/local/lib/python3.12/dist-packages (from beautifulsoup4->nbconvert) (4.15.0) Requirement already satisfied: attrs>=22.2.0 in /usr/local/lib/python3.12/dist-packages (from jsonschema>=2.6->nbformat>=5.7->nbconvert) (25.4.0) Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /usr/local/lib/python3.12/dist-packages (from jsonschema>=2.6->nbformat>=5.7->nbconvert) (2025.9.1) Requirement already satisfied: referencing>=0.28.4 in /usr/local/lib/python3.12/dist-packages (from jsonschema>=2.6->nbformat>=5.7->nbconvert) (0.37.0) Requirement already satisfied: rpds-py>=0.7.1 in /usr/local/lib/python3.12/dist-packages (from jsonschema>=2.6->nbformat>=5.7->nbconvert) (0.28.0) Requirement already satisfied: entrypoints in /usr/local/lib/python3.12/dist-packages (from jupyter-client>=6.1.12->nbclient>=0.5.0->nbconvert) (0.4) Requirement already satisfied: nest-asyncio>=1.5.4 in /usr/local/lib/python3.12/dist-packages (from jupyter-client>=6.1.12->nbclient>=0.5.0->nbconvert) (1.6.0) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.12/dist-packages (from jupyter-client>=6.1.12->nbclient>=0.5.0->nbconvert) (2.9.0.post0) Requirement already satisfied: pyzmq>=23.0 in /usr/local/lib/python3.12/dist-packages (from jupyter-client>=6.1.12->nbclient>=0.5.0->nbconvert) (26.2.1) Requirement already satisfied: tornado>=6.2 in /usr/local/lib/python3.12/dist-packages (from jupyter-client>=6.1.12->nbclient>=0.5.0->nbconvert) (6.5.1) Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.12/dist-packages (from python-dateutil>=2.8.2->jupyter-client>=6.1.12->nbclient>=0.5.0->nbconvert) (1.17.0)
%%shell
jupyter nbconvert --to html '/content/drive/My Drive/Machine_Learning_SL_Full_Code_Akomolafe_Samson_Updated_a.ipynb'
[NbConvertApp] WARNING | pattern '/content/drive/My Drive/Machine_Learning_SL_Full_Code_Akomolafe_Samson_Updated_a.ipynb' matched no files
This application is used to convert notebook files (*.ipynb)
to various other formats.
WARNING: THE COMMANDLINE INTERFACE MAY CHANGE IN FUTURE RELEASES.
Options
=======
The options below are convenience aliases to configurable class-options,
as listed in the "Equivalent to" description-line of the aliases.
To see all configurable class-options for some <cmd>, use:
<cmd> --help-all
--debug
set log level to logging.DEBUG (maximize logging output)
Equivalent to: [--Application.log_level=10]
--show-config
Show the application's configuration (human-readable format)
Equivalent to: [--Application.show_config=True]
--show-config-json
Show the application's configuration (json format)
Equivalent to: [--Application.show_config_json=True]
--generate-config
generate default config file
Equivalent to: [--JupyterApp.generate_config=True]
-y
Answer yes to any questions instead of prompting.
Equivalent to: [--JupyterApp.answer_yes=True]
--execute
Execute the notebook prior to export.
Equivalent to: [--ExecutePreprocessor.enabled=True]
--allow-errors
Continue notebook execution even if one of the cells throws an error and include the error message in the cell output (the default behaviour is to abort conversion). This flag is only relevant if '--execute' was specified, too.
Equivalent to: [--ExecutePreprocessor.allow_errors=True]
--stdin
read a single notebook file from stdin. Write the resulting notebook with default basename 'notebook.*'
Equivalent to: [--NbConvertApp.from_stdin=True]
--stdout
Write notebook output to stdout instead of files.
Equivalent to: [--NbConvertApp.writer_class=StdoutWriter]
--inplace
Run nbconvert in place, overwriting the existing notebook (only
relevant when converting to notebook format)
Equivalent to: [--NbConvertApp.use_output_suffix=False --NbConvertApp.export_format=notebook --FilesWriter.build_directory=]
--clear-output
Clear output of current file and save in place,
overwriting the existing notebook.
Equivalent to: [--NbConvertApp.use_output_suffix=False --NbConvertApp.export_format=notebook --FilesWriter.build_directory= --ClearOutputPreprocessor.enabled=True]
--coalesce-streams
Coalesce consecutive stdout and stderr outputs into one stream (within each cell).
Equivalent to: [--NbConvertApp.use_output_suffix=False --NbConvertApp.export_format=notebook --FilesWriter.build_directory= --CoalesceStreamsPreprocessor.enabled=True]
--no-prompt
Exclude input and output prompts from converted document.
Equivalent to: [--TemplateExporter.exclude_input_prompt=True --TemplateExporter.exclude_output_prompt=True]
--no-input
Exclude input cells and output prompts from converted document.
This mode is ideal for generating code-free reports.
Equivalent to: [--TemplateExporter.exclude_output_prompt=True --TemplateExporter.exclude_input=True --TemplateExporter.exclude_input_prompt=True]
--allow-chromium-download
Whether to allow downloading chromium if no suitable version is found on the system.
Equivalent to: [--WebPDFExporter.allow_chromium_download=True]
--disable-chromium-sandbox
Disable chromium security sandbox when converting to PDF..
Equivalent to: [--WebPDFExporter.disable_sandbox=True]
--show-input
Shows code input. This flag is only useful for dejavu users.
Equivalent to: [--TemplateExporter.exclude_input=False]
--embed-images
Embed the images as base64 dataurls in the output. This flag is only useful for the HTML/WebPDF/Slides exports.
Equivalent to: [--HTMLExporter.embed_images=True]
--sanitize-html
Whether the HTML in Markdown cells and cell outputs should be sanitized..
Equivalent to: [--HTMLExporter.sanitize_html=True]
--log-level=<Enum>
Set the log level by value or name.
Choices: any of [0, 10, 20, 30, 40, 50, 'DEBUG', 'INFO', 'WARN', 'ERROR', 'CRITICAL']
Default: 30
Equivalent to: [--Application.log_level]
--config=<Unicode>
Full path of a config file.
Default: ''
Equivalent to: [--JupyterApp.config_file]
--to=<Unicode>
The export format to be used, either one of the built-in formats
['asciidoc', 'custom', 'html', 'latex', 'markdown', 'notebook', 'pdf', 'python', 'qtpdf', 'qtpng', 'rst', 'script', 'slides', 'webpdf']
or a dotted object name that represents the import path for an
``Exporter`` class
Default: ''
Equivalent to: [--NbConvertApp.export_format]
--template=<Unicode>
Name of the template to use
Default: ''
Equivalent to: [--TemplateExporter.template_name]
--template-file=<Unicode>
Name of the template file to use
Default: None
Equivalent to: [--TemplateExporter.template_file]
--theme=<Unicode>
Template specific theme(e.g. the name of a JupyterLab CSS theme distributed
as prebuilt extension for the lab template)
Default: 'light'
Equivalent to: [--HTMLExporter.theme]
--sanitize_html=<Bool>
Whether the HTML in Markdown cells and cell outputs should be sanitized.This
should be set to True by nbviewer or similar tools.
Default: False
Equivalent to: [--HTMLExporter.sanitize_html]
--writer=<DottedObjectName>
Writer class used to write the
results of the conversion
Default: 'FilesWriter'
Equivalent to: [--NbConvertApp.writer_class]
--post=<DottedOrNone>
PostProcessor class used to write the
results of the conversion
Default: ''
Equivalent to: [--NbConvertApp.postprocessor_class]
--output=<Unicode>
Overwrite base name use for output files.
Supports pattern replacements '{notebook_name}'.
Default: '{notebook_name}'
Equivalent to: [--NbConvertApp.output_base]
--output-dir=<Unicode>
Directory to write output(s) to. Defaults
to output to the directory of each notebook. To recover
previous default behaviour (outputting to the current
working directory) use . as the flag value.
Default: ''
Equivalent to: [--FilesWriter.build_directory]
--reveal-prefix=<Unicode>
The URL prefix for reveal.js (version 3.x).
This defaults to the reveal CDN, but can be any url pointing to a copy
of reveal.js.
For speaker notes to work, this must be a relative path to a local
copy of reveal.js: e.g., "reveal.js".
If a relative path is given, it must be a subdirectory of the
current directory (from which the server is run).
See the usage documentation
(https://nbconvert.readthedocs.io/en/latest/usage.html#reveal-js-html-slideshow)
for more details.
Default: ''
Equivalent to: [--SlidesExporter.reveal_url_prefix]
--nbformat=<Enum>
The nbformat version to write.
Use this to downgrade notebooks.
Choices: any of [1, 2, 3, 4]
Default: 4
Equivalent to: [--NotebookExporter.nbformat_version]
Examples
--------
The simplest way to use nbconvert is
> jupyter nbconvert mynotebook.ipynb --to html
Options include ['asciidoc', 'custom', 'html', 'latex', 'markdown', 'notebook', 'pdf', 'python', 'qtpdf', 'qtpng', 'rst', 'script', 'slides', 'webpdf'].
> jupyter nbconvert --to latex mynotebook.ipynb
Both HTML and LaTeX support multiple output templates. LaTeX includes
'base', 'article' and 'report'. HTML includes 'basic', 'lab' and
'classic'. You can specify the flavor of the format used.
> jupyter nbconvert --to html --template lab mynotebook.ipynb
You can also pipe the output to stdout, rather than a file
> jupyter nbconvert mynotebook.ipynb --stdout
PDF is generated via latex
> jupyter nbconvert mynotebook.ipynb --to pdf
You can get (and serve) a Reveal.js-powered slideshow
> jupyter nbconvert myslides.ipynb --to slides --post serve
Multiple notebooks can be given at the command line in a couple of
different ways:
> jupyter nbconvert notebook*.ipynb
> jupyter nbconvert notebook1.ipynb notebook2.ipynb
or you can specify the notebooks list in a config file, containing::
c.NbConvertApp.notebooks = ["my_notebook.ipynb"]
> jupyter nbconvert --config mycfg.py
To see all available configurables, use `--help-all`.
--------------------------------------------------------------------------- CalledProcessError Traceback (most recent call last) /tmp/ipython-input-2027240590.py in <cell line: 0>() ----> 1 get_ipython().run_cell_magic('shell', '', "jupyter nbconvert --to html '/content/drive/My Drive/Machine_Learning_SL_Full_Code_Akomolafe_Samson_Updated_a.ipynb'\n") /usr/local/lib/python3.12/dist-packages/google/colab/_shell.py in run_cell_magic(self, magic_name, line, cell) 274 if line and not cell: 275 cell = ' ' --> 276 return super().run_cell_magic(magic_name, line, cell) 277 278 /usr/local/lib/python3.12/dist-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell) 2471 with self.builtin_trap: 2472 args = (magic_arg_s, cell) -> 2473 result = fn(*args, **kwargs) 2474 return result 2475 /usr/local/lib/python3.12/dist-packages/google/colab/_system_commands.py in _shell_cell_magic(args, cmd) 110 result = _run_command(cmd, clear_streamed_output=False) 111 if not parsed_args.ignore_errors: --> 112 result.check_returncode() 113 return result 114 /usr/local/lib/python3.12/dist-packages/google/colab/_system_commands.py in check_returncode(self) 135 def check_returncode(self): 136 if self.returncode: --> 137 raise subprocess.CalledProcessError( 138 returncode=self.returncode, cmd=self.args, output=self.output 139 ) CalledProcessError: Command 'jupyter nbconvert --to html '/content/drive/My Drive/Machine_Learning_SL_Full_Code_Akomolafe_Samson_Updated_a.ipynb' ' returned non-zero exit status 255.